a new galaxy group finding algorithm: probability friends ... - IOPscience

5 downloads 0 Views 946KB Size Report
ABSTRACT. A new algorithm is developed, based on the friends-of-friends ( FOF ) algorithm, to identify galaxy groups in a gal- axy catalog in which the redshift ...
The Astrophysical Journal, 681:1046–1057, 2008 July 10 # 2008. The American Astronomical Society. All rights reserved. Printed in U.S.A.

A NEW GALAXY GROUP FINDING ALGORITHM: PROBABILITY FRIENDS-OF-FRIENDS Hauyu Baobab Liu,1,2 B. C. Hsieh,2 Paul T. P. Ho,1,2,3 Lihwai Lin,2 and Renbin Yan4 Received 2007 August 9; accepted 2008 March 11

ABSTRACT A new algorithm is developed, based on the friends-of-friends ( FOF ) algorithm, to identify galaxy groups in a galaxy catalog in which the redshift errors have large dispersions (e.g., a photometric redshift galaxy catalog in which a portion of the galaxies also have much more precise spectroscopic redshifts). The DEEP2 mock catalogs, with our additional simulated photometric redshift errors, are used to test the performance of our algorithm. The association of the reconstructed galaxy groups with the dark halos in the mock catalogs gives an idea about the completeness and purity of the derived group catalog. Our results show that in a 0:6  z  1:6 galaxy catalog with an R-band limiting magnitude of 24.1 and an average 1  photometric redshift error of 0.03, the overall purity of our new algorithm for richness 4 –7 (line-of-sight velocity dispersion 300 km s1) groups is higher than 70% (i.e., 70% of the groups reconstructed by our algorithm are related to real galaxy groups). The performance of the new algorithm is compared with the performance of the FOF algorithm, and it is suggested that this new algorithm is better than FOF for a database, given the same redshift uncertainties. Subject headingg s: catalogs — galaxies: clusters: general — galaxies: halos — galaxies: high-redshift — galaxies: photometry — methods: data analysis

1. INTRODUCTION

In their FOF papers, Huchra & Geller (1982) and Geller & Huchra (1983) proposed a framework for the identification of groups in spectroscopic redshift galaxy catalogs. They also proposed the corrections for the background contamination and for the completeness in the resulting groups. More detailed discussion for FOF is also in a recent paper by Eke et al. (2004). Berlind et al. (2006) applied the FOF algorithm on the SDSS redshift survey. Their sample covers 3495.1 deg2 on the sky, and redshifts up to 0.1. Their result provides a constraint on the halo occupation distribution. Other approaches for group-finding algorithms for spectroscopic redshift galaxies, which rely on properties of dark halos, can be found in Marinoni et al. (2002), Gerke et al. (2005), Miller et al. (2005), and Yang et al. (2005). The group-finding algorithms developed for the spectroscopic redshift survey data usually suffer from the incompleteness problem (Gerke et al. 2005). In particular, the richness and abundance of small galaxy groups are biased due to the fiber-collision problem ( Berlind et al. 2006). These problems become more severe toward the higher redshifts, when the measurements of the spectroscopic redshift become more time consuming and the angular sizes of the structures become smaller. In these methods, the color distribution of high-redshift galaxy groups could also be biased toward the blue for spectroscopic redshift surveys due to the higher success rates for redshift measurements of blue galaxies, which have strong emission lines. Recently, photometric redshift techniques (template fitting: Bolzonella et al. 2000; polynomial fitting: Connolly et al. 1995; Hsieh et al. 2005; ANNz: Collister & Lahav 2004; Vanzella et al. 2004) can be used to supplement the incomplete spectroscopic redshift data. In the photometric redshift techniques, the estimated redshift is derived from the photometry of each object. It has the advantage that, usually, it takes a much shorter exposure time to obtain the photometric data in comparison with the spectroscopic data, in order to make a redshift measurement. Furthermore, without the fiber-collision problem, the field coverage of the photometric data is more complete. However, the derived photometric redshifts usually have poor accuracy (1–2 orders of magnitude lower precision than the spectroscopic redshifts), and the redshift

In the past three decades, the large area and deep optical and infrared galaxy surveys (Nichol 2004; Colless et al. 2001; Davis et al. 2003; Le Fev`re et al. 2005; Ilbert et al. 2006; Dye et al. 2006; Warren et al. 2007) showed rich group and cluster structures on a variety of scales (see Giovanelli & Haynes [1991] for earlier works). The detection and study of large-scale structures became an important tool for improving our understanding of the universe on both cosmological and galactic scales (compact groups: de Oliveira & Hickson 1994; Longo et al. 1994; Diaferio et al. 1994; Gavernato et al. 1996; Rubin et al. 1991; galaxy merger: Barnes 1984; Bode et al. 1993; halo occupation and cosmology: Yang et al. 2003; van den Bosch et al. 2003a, 2003b, 2004, 2007). The identification of galaxy groups by an objective group-finding algorithm will produce reliable group catalogs that can be used for further detailed analysis of structures. Such group catalogs will also produce candidates for follow-up observations. The earlier group-finding algorithms mostly work only on the twodimensional projected images. Relying on projected images only clearly biases the investigations of structures. Some rich clusters (richness N  10) can be identified from their red sequences in the color and magnitude distributions of galaxies (Gladder & Yee 2000, 2005; Gal et al. 2003; Koester et al. 2007a, 2007b). The red sequences also provide precise redshift estimations for the clusters. However, as emphasized by Gal et al. (2003) and Koester et al. (2007a, 2007b), this class of groupfinding algorithms can only detect clusters with a substantial population of early-type galaxies. Besides, by these methods, for a successfully identified cluster, their late-type cluster members will be missed. 1

Department of Physics, National Taiwan University, Taipei 106, Taiwan. Academia Sinica Institute of Astronomy and Astrophysics, P.O. Box 23-141, Taipei 106, Taiwan; [email protected], [email protected], pho@ asiaa.sinica.edu.tw, [email protected]. 3 Harvard-Smithsonian Center for Astrophysics, Cambridge, MA. 4 Department of Astronomy and Astrophysics, University of Toronto, 50 St. George Street, Toronto, ON M5S 3H4, Canada; [email protected]. 2

1046

PFOF ALGORITHM uncertainties may vary widely from galaxy to galaxy in a single photometric redshift galaxy catalog. There have been few attempts to establish a statistically robust way to deal with the large redshift uncertainties and the large variety of redshift uncertainties in photometric redshift galaxy catalogs. In general, the relationship between redshift uncertainties and galaxy grouping criteria is a neglected area. Botzler et al. (2004) and van Breukelen et al. (2006) provide ways to identify galaxy groups and clusters in photometric redshift galaxy catalogs. But both methods are biased by the artificial redshift slices; small galaxy groups are easily missed (although this also helps to guard against the notorious ‘‘finger-of-god’’ effect). Due to the sparseness of spectroscopic redshift data and the large redshift uncertainties of photometric redshift data, the identifications of smaller groups always have a large interloper ratio and low purity. Based on these considerations, we aim to develop an automatic and objective group-finding algorithm, which takes advantage of both the higher accuracy of the spectroscopic redshift data and the greater completeness of the photometric redshift catalog. The algorithm we introduce in this paper is a modified version of the FOF method. By taking consideration of probability,5 the new algorithm provides a robust way to identify galaxy groups and clusters in a galaxy redshift survey with nonuniform redshift uncertainties. The differences of photometric redshift uncertainties between bluer, fainter galaxies and redder, brighter galaxies are naturally accommodated in our new algorithm. In this way, this method also allows us to deal with a galaxy catalog in which a portion of the galaxies also have spectroscopic redshifts. We use the DEEP2 mock catalog to test the performance of our algorithm.6 This mock catalog is constructed from the cosmological N-body (smoothed particle hydrodynamics) simulation. Thus, the distribution of those dark halos is known. After the N-body simulation, galaxies are assigned to the dark halos according to a halo occupation distribution (HOD), and the simulation results are converted into a mock survey that can be compared with the observational results. In the mock survey, the members of a true galaxy group are located in the same dark halo, and our algorithm aims to identify those galaxy groups separately. The galaxy groups identified by the group-finding algorithm are called the reconstructed groups. If the intersection of a reconstructed group with the true galaxy groups is lower than a certain threshold, then it is identified as a spurious detection. From the number of the spurious detections, we can derive the purity of the reconstructed group catalog. The abundances of the reconstructed groups and the true galaxy groups are also compared. Descriptions of the DEEP2 mock catalog can be seen in Yan et al. (2004) and White et al. (2002). More about smoothed particle hydrodynamics (SPH) cosmological simulations can be found in Couchman & Thomas (1995), Springel & Hernquist (2002), Monaghan (2005), Springel et al. (2001), and Springel (2005). Springel et al. (2001), Blaizot et al. (2005), Kitzbichler & White (2007), and De Lucia & Blaizot (2007) describe in more detail how to generate a mock optical and infrared whole-sky survey from the SPH cosmological simulations. The remainder of this paper is divided into four sections. Section 2 describes the details of our group-finding algorithm: basic 5

At the time this paper was submitted, a parallel work of identifying galaxy groups/clusters using photometric redshift probability distribution was presented by Li & Yee (2008). Our proposed algorithm, however, has the advantage of being applied to galaxy catalogs that have a wider range of redshift precision. 6 We choose to classify the reconstructed groups by richness instead of mass, because richness is directly observed. One of the advantages of our algorithm is to be able to identify some low-richness groups that may be missed by those groupfinding algorithms that rely purely on spectroscopic redshift data.

1047

concept and methodology. Section 3 provides an introduction about how the mock catalogs are constructed and how to optimize the parameters in our program with the mock catalogs. Section 4 shows the performance of our algorithm on the mock catalogs, and x 5 provides a discussion of the relationship between those parameters in our algorithm and the performance of the algorithm. We discuss in more detail how the galaxy redshift probability distribution functions. An alternative to bypass the incompleteness correction and the strategy for selecting the complete sample are also suggested in this section. 2. ALGORITHM: PROBABILITY FRIENDS-OF-FRIENDS The PFOF algorithm is a modified version of the FOF algorithm (Huchra & Geller 1982). In this section, we first introduce the most relevant previous work, the FOF algorithm, and the extended FOF (EXT-FOF ) algorithm (Botzler et al. 2004). Then we introduce the PFOF algorithm and compare the concepts and structures of the EXT-FOF algorithm and the PFOF algorithm. 2.1. FOF and EXT-FOF 2.1.1. FOF

FOF is a typical method to associate objects by physical quantities, such as position ( Huchra & Geller 1982; Geller & Huchra 1983; Eke et al. 2004). Galaxies in the same galaxy group must be spatially associated and therefore must be able to be identified by FOF. In FOF, two criteria, Dij  DL ;

ð1Þ

Vij  VL ;

ð2Þ

are used to decide if two galaxies are physically associated, where Dij and Vij are the distances between galaxies i, j in the projected direction and the line-of-sight direction, respectively. The terms DL and VL are called linking lengths, which are the thresholds for considering galaxies i, j to be related. Generically, DL and VL can be functions of redshift and other local physical properties, such as local density. These kinds of criteria are hereafter called linking criteria. Picking up a galaxy  first, if a galaxy  fulfills the above two criteria, galaxy  is called the friend of galaxy . The galaxy , together with all its friends, and friends of its friends, form a group and will be given a unique group ID in the FOF algorithm. If a galaxy has no friend, it is identified as a field galaxy. Manifestly, the group catalog will not depend on the order of picking galaxies. Another criterion, Na  Nc ;

ð3Þ

can be used to filter out the groups with small richness, where Nc is the richness criterion and Na is the number of members of group a. 2.1.2. EXT-FOF

The EXT-FOF algorithm ( Botzler et al. 2004) is a modified FOF method, developed to identify galaxy groups or clusters in a photometric redshift galaxy catalog, which has larger redshift errors by about 2 orders of magnitude as compared to a spectroscopic redshift catalog. This algorithm divides the galaxy catalog in the line-of-sight direction into many nonoverlapped z-slices. Each z-slice has the same field of view, but a different redshift range.

1048

LIU ET AL.

In each z-slice, the FOF is processed with the linking criteria modified by   ij Dðzini Þ  DL ; :¼ 2sin 2 "  #1=2 VL 2 2 Vi  þ ðczi Þ ¼: VL; i ; 2 Dij

" Vj 

VL 2

2



þ czj

2

ð4Þ

ð5Þ

#1=2 ¼: VL; j ;

ð6Þ

where zini is the mean redshift of that z-slice, c is the light speed, and zi and zj are the redshift uncertainties of galaxy i and j, respectively; zi and zj can be given by either the photometric redshift uncertainties or the spectroscopic redshift errors. The terms Vi and Vj are defined by7 Vi  jvi  czini j;   Vj  vj  czini :

ð7Þ ð8Þ

After executing FOF for all the z-slices, the same linking criteria are used again to glue up the galaxy groups in different z-slices, if they are spatially close to each other. From equations (5) and (6), we can see that the redshift uncertainties are considered in the EXT-FOF algorithm. However, the definition of z is ambiguous. For example, the redshift uncertainties of galaxies may not follow Gaussian distributions, and the shapes of the redshift probability distribution functions could differ from galaxy to galaxy. In such a case, using one parameter, z, undersamples the profile of their redshift uncertainties. Moreover, even if all galaxies have their redshift probability distribution functions as Gaussian distributions, the choice of z is somehow arbitrary (e.g., 1, 2, 3, or 0  as an extreme case). Moreover, generically, the terms (VL /2) in equations (5) and (6) can be VL /a and VL /b, respectively, where 1/a þ 1/b ¼ 1 and a and b are chosen by weighting the redshift uncertainty of galaxies i and j. The special case of a ¼ b ¼ 2 uses arbitrarily chosen values of a and b, and its statistical meaning needs more interpretation. The choice of a and b in generic cases carries implicitly the ambiguity of weighting the redshift uncertainty. The difficulties mentioned above make it hard to establish a standard for EXT-FOF. And hence the group-finding results from EXT-FOF with different definitions of z and different chosen a and b values are hard to compare. 2.2. Probability FOF To overcome these disadvantages of the FOF and EXT-FOF algorithms, three major considerations are part of the construction of our new algorithm. 1. At any stage of the group-finding process, each galaxy retains its redshift uncertainty. 2. The galaxy redshift uncertainties are dealt with in the linking criteria by a statistical method. The redshift uncertainty of

Vol. 681

each galaxy can be modeled independently. But the group-finding algorithm should have unified criteria, which naturally accommodate data with various uncertainties. 3. In the ideal case where all galaxies have no redshift uncertainty, the performance of this algorithm converges to the original FOF.8 List points 1 and 2 deal with the uncertainties in the observational data. For a sharp and symmetric distribution function of the redshift uncertainties, the mean of that distribution function is adequate for describing that distribution function; but if the distribution function is wide and asymmetric, a more detailed description of the distribution function is necessary in order to get the unbiased result on the statistics. The mean of a distribution function is simply the first moment of that distribution function, and in a more accurate accounting of the redshift uncertainty, the higher moments must be considered. More specifically, for now we are considering the entire probability distribution function of a physical quantity, instead of just its mean or its mean and error bar. List point 3 provides the physical meaning for the group-finding criteria. From x 2.1 we see that FOF looks for the physical association of galaxies. By retaining the performance of the new algorithm as FOF in the ideal case, the function of the new algorithm can be understood. We implement these basic concepts by calculating the probability of the distance between any two galaxies to be less than the linking length. If the probability is larger than an artificial threshold, we consider that these two galaxies are physically associated; otherwise, they are not. In the line-of-sight direction,9 the probability of the distance between two galaxies being less than the linking length is given by Z Pðjz2  z1 j  VL Þ 

Z

1

zþVL

dz G1 (z) 0

G2 ðz 0 Þ dz 0 ;

ð9Þ

zVL

where P stands for probability, G1 and G2 are the probability distribution functions for the two galaxies in the line-of-sight direction, and VL is the linking length. While testing the PFOF algorithm by the mock catalogs, we model G1 and G2 by Gaussian distributions, with their means being the simulated photometric redshifts and with redshift errors that are described in x 3.1.3. More detailed discussions about G1 and G2 are provided in x 5. Thus, the linking criterion is modified to be Pðjz2  z1 j  VL Þ > Pth ;

ð10Þ

where Pth is the artificial probability threshold. The PFOF algorithm is aimed at the probability of two objects being associated, and not just at the overlap of two distribution functions, as in the EXT-FOF algorithm. One of the great improvements in the PFOF algorithm is that it considers not only the width of an error distribution, but also its probability amplitude, while the EXT-FOF algorithm considers only the former. Because less accurate measurements should have wider error distribution functions and lower averaged probability amplitudes, they will

7

The line-of-sight distance can be expressed in the unit of length, redshift, or velocity. The original EXT-FOF algorithm used the velocity unit, and we just follow it here. At low redshift, the velocity unit or redshift can be converted to the comoving distance using Hubble’s law. For high redshift, it can be computed by integrating the equation dt ¼ a(t) dx, where a is the scale factor in the FRW metric and x is the comoving distance.

8 The flow chart and grouping criteria for FOF and EXT-FOF can be seen in Huchra & Geller (1982) and Botzler et al. (2004). 9 In the direction perpendicular to the line of sight, our grouping criteria are the same as for the original FOF, because the measurement uncertainty for right ascension and declination is very small.

No. 2, 2008

PFOF ALGORITHM

1049

pointings in the redshift direction, and then we calculate the value = of ½nvol (z)/nCux (z)1 3 in each redshift bin. Here the nvol (z) is the total number of galaxies in a mock pointing in the redshift bin of central redshift z, and nCux (z) is the number of galaxies with apparent magnitudes smaller than 24.1 in that redshift bin. It is clear that if all galaxies have no redshift uncertainties, then G1 and G2 in equation (9) become delta functions, and equation (10) is equivalent to equation (2) for a nonzero Pth. 3. OPTIMIZING WITH MOCK CATALOGS

Fig. 1.—Variations of linking lengths with redshift. The x-axis is the redshift, and the y-axis is the ratio DL /D0 , or VL /V0. The circles and crosses show the DL /D0 , or VL /V0, ratios of mock1 and mock2 in different redshifts; mock1 and mock2 are mock catalogs that are used to test our group-finding algorithm. More detailed descriptions of these two mock catalogs can be seen in x 3.1.2. Each mock catalog has a different number of galaxies and galaxy groups. These DL /D0 , or VL /V0 , ratios are the same for both fixing linking length in the comoving coordinates case and the fixing linking length in the physical coordinates case.

have less impact on the structure identified from the more accurate data in the PFOF algorithm, as compared to the EXT-FOF algorithm. The linking length in the line-of-sight direction VL , and the linking length in the plane of the sky direction DL , are given by 

 nvol (z) 1=3 DL ¼ D0 ; nCux (z)   nvol (z) 1=3 ; VL ¼ V0 nCux (z)

ð11Þ

ð12Þ

where D0 and V0 are constant fiducial linking lengths, and nvol(z) and nflux(z) are the number densities of galaxies at redshift z in volume-limited and flux-limited mock catalogs, respectively. The ½nvol (z)/nCux (z)1=3 terms are to compensate for the decreasing survey depth of the sample with increasing redshift. More about our mock catalog can be seen in x 3.1. In reality, the galaxy luminosity and galaxy concentration in the group should have evolutions. But for simplicity we do not take them into consideration in our simulation. See Efstathiou (1988), Lilly et al. (1995), Lin et al. (1996, 1999), and van Dokkum & Stanford (2003) for the luminosity function and its evolution. In this paper, we use the flat CDM cosmological model with

M ¼ 0:3 and h ¼ 0:7. The D0 and V0 are both expressed in the length unit megaparsecs. Figure 1 shows the variation of linking length with redshift, derived from two different pointings in the same mock catalog, denoted by mock1 and mock2 (see x 3.1.2). Their difference comes from the cosmic variance. Later on we show that the results of our simulation are not sensitive to that difference. These DL /D0 or VL /V0 ratios are calculated by binning the galaxies in the mock

A galaxy group (or cluster) finder is an algorithm that aims to find physically associated sets of galaxies. It is difficult to offer a definition of ‘‘physically associated’’ in an observational galaxy catalog, due to the sparseness of the detections of the underlying (dark) matter distributions and the lack of clarity of the true motion for each object. For example, even if several galaxies fulfill the linking criteria of a group-finding algorithm and are identified as a group, it is still possible for them to have very large relative velocities in the projected plane of the sky. If the velocity dispersion of those galaxies far exceeds the virial condition, then physically they are weakly associated, if at all. The peculiar velocities also generate some uncertainties for the measurement of positions in the lineof-sight direction. The observational errors further degrade the quality of the information that we can obtain. To derive the physical parameters from the limited information, and to interpret the group-finding results, we need a model for how those galaxies, together with their observational errors, sample the underlying dark matter density. One way to interpret the group-finding result is to take a reference of the mock catalogs, which are generated from a cosmological N-body simulation. If the statistics of the mock catalog, such as galaxy density, two-point correlation function, and luminosity function, agree with observations, then the N-body simulation can represent the real world appropriately. By comparing the group-finding results of the mock catalogs with the distribution of the matter overdensity regions, we can test how good the group-finding algorithm is and how the observational errors affect the result. The term ‘‘real galaxy group’’ in our simulations of group finding refers to a set of galaxies in which all the members belong to the same dark matter halo in a cosmological N-body simulation. More about the identification of dark matter halo in a cosmological N-body simulation can be found in Springel et al. (2001). The following subsections introduce our mock catalogs and describe how we choose the parameters according to the mock catalogs. The ‘‘reconstructed groups’’ refer to those galaxy groups identified by the algorithm from the mock catalogs. 3.1. The DEEP2 Mock Catalogs 3.1.1. The DEEP2 Galaxy Redshift Survey

The DEEP2 Galaxy Redshift Survey ( Davis et al. 2003) was a 3 year program using the DEIMOS spectrograph on the Keck II telescope covering 3 deg2 of sky in four widely separated fields, to a limiting magnitude of RAB ¼ 24:1. Three-band photometry is obtained with the CFH12K camera on the CanadaFrance-Hawaii Telescope (CFHT; see Coil et al. 2004). In three of the four fields, a color-color cut is applied to select galaxies at z > 0:75. Each of the three fields consists of two or three CFHT pointings of the size 0:7 ; 0:5 deg2. The fourth field, called the extended Groth strip (EGS), covers 2:0 ; 0:25 deg2. Galaxies in the EGS are targeted for spectroscopy regardless of color. The statistical tests done on the fourth field prove that the selection of

1050

LIU ET AL.

high-z galaxies by applying the color-color cut is successful. In total, about 60% of the galaxies that meet the target selection criteria are targeted for the DEIMOS spectrograph on the Keck II telescope. The overall redshift success rate is about 70%. 3.1.2. Profile of the DEEP2 Mock Catalogs

The DEEP2 mock catalogs are generated to simulate the observational results of the DEEP2 galaxy redshift survey (White 2002). The details of the mock catalog can be found in Yan et al. (2004). We briefly summarize it here. First, a HOD function is derived from the observed galaxy correlation function and luminosity function (Yan et al. 2003). HOD specifies the expected number of galaxies in each dark matter halo according to its mass. With dark matter halos identified in the N-body simulation, a number of simulation particles in each halo are then tagged as galaxies. The resulting low-order clustering properties in the mock catalogs are found to be consistent with the observed results from the DEEP2 survey. Therefore, these catalogs are suitable for our study. We use an updated version of the DEEP2 mock catalogs, which contains 12 independent fields of 2:1 ; 0:5 deg2, generated with the same cosmological parameters, and with similar statistics. Each field consists of three pointings, each of which is 0:7 ; 0:5 deg2. All of the 36 mock pointings have the same field of view as the CFHT pointings. The mock catalogs are constructed where the N-body simulation uses a flat CDM cosmology, with cosmological parameters

M ¼ 0:3, 8 ¼ 0:9, spectral index n ¼ 0:95, and h ¼ 0:7. The redshifts of the galaxies range from around 0.6 to 1.6, and the mass resolution is around 1010 M h1. The foreground roll-off of the redshift distribution is to mimic the photometric preselection adopted by DEEP2 in three of the four fields. Each single catalog contains two samples: a volume-limited sample containing galaxies with luminosities greater than 0:1L (z) and an ‘‘observed’’ sample including only galaxies that pass the DEEP2 target selection.10 The latter is flux-limited down to RAB ¼ 24:1 and has a redshift distribution similar to the observed sample in the three DEEP2 fields with a photometric preselection. We use the following information for each galaxy in the mock catalog: projected coordinates (R.A., decl.), spectroscopic redshift (spectro-z), apparent R-band magnitude (mR ), and a group ID to describe which dark halo it occupies. In practice, only a portion of the galaxies can be targeted for spectroscopy because of crowding, and not all galaxies that are targeted for spectroscopy have successful spectroscopic redshift measurements. To mimic the observations, galaxies are selected by the DEEP2 mask-making algorithm, achieving a target rate of 60% for spectroscopy. Another algorithm called fakezsuccess, provided by the DEEP2 group, is used to simulate the redshift success rate as a function of R magnitude. In the fakezsuccess algorithm, galaxies are binned by their apparent magnitudes mR . Bins with average apparent magnitudes mR < 22:6 have the same redshift success rates; otherwise, the redshift success rates decrease linearly with the increase of apparent magnitude. From mR ¼ 22:6 to 24.1, the redshift success rate drops by 15%. The overall redshift success rate is 70%. The total number of galaxies that have successful spectroscopic redshift measurements is the total number of galaxies times the target rate times the redshift 10 One L corresponds to the characteristic rest-frame absolute magnitude M  19:5. With the cosmological parameters we use, the luminosity distances at redshifts 0.6, 1.0, and 1.6 are about 3500, 6650, and 12,000 Mpc, respectively. Without considering the K-correction, 0.1L corresponds to an apparent magnitude of 24.5, 26.0, and 27.5 for these three redshifts.

Vol. 681

success rate. Hence, about 40% of the galaxies in a mock catalog have spectroscopic redshifts. Since we use the versions of the DEEP2 mock catalogs that contain the simulated photometries only at R band for the galaxies, we do not have the photo-z. In order to test how the photo-z errors affect the performance of the PFOF algorithm, we further make a simulation to generate the photo-z for the simulated galaxies in the flux-limited samples. 3.1.3. The Simulated Photometric Redshift

We adopt a realistic approach in our photo-z simulation, which allows for a variable photo-z error distribution. In the simulation, first we assign to each galaxy a distribution of redshift error and then offset the galaxy from its spectro-z by a random number generator, in accordance with its distribution of redshift errors. Because the photometric redshift techniques do not provide reliable estimates of redshifts for all galaxies, we define a parameter f called ‘‘bad photo-z fraction’’ in our photo-z simulation. This fraction of galaxies corresponds to galaxies that have photometric redshift errors larger than 3  of the photometric redshift errors in the observational data, which sometimes have unpredictable behaviors in their errors (Hsieh et al. 2005). Suppose the total number of galaxies in the flux-limited sample is Nf ; then f Nf galaxies will have their photo-z simulated by a uniform distribution in our redshift range. To simulate the photo-z of the other (1  f )Nf galaxies, two random processes, -var and z-var, are used. For these (1  f )Nf galaxies, the redshift probability distribution function of each galaxy is modeled by a Gaussian distribution (Hsieh et al. 2005). The z-var process is to generate the photo-z for each galaxy from a Gaussian distribution. Recalling x 2.2, the entire redshift probability distribution functions are considered in PFOF, not as just the means of the redshift probability distribution functions. Hence the standard deviation of each distribution function will affect the group-finding result, and needs to be simulated to enable galaxies to have distribution functions of different widths. The -var process is to assign the standard deviation of the Gaussian distribution that is used in the z-var process, where the -var process is also modeled by a Gaussian distribution. The means of these two random processes are denoted by m and mz , and the standard deviations are denoted by  and z , respectively. In general, the photo-z of each galaxy can be generated from a different set of  , z , mz , and m . For simplicity, we assume that all galaxies have the same m and  values. For each galaxy i with spectro-z ¼ zi , its mz is set to be zi . The spectroscopic redshift error is also simulated in the same way, where m is set to be much smaller and  is set to be zero. We set m and  according to the photometric redshift performance expected from modern deep multiwavelength surveys (Mobasher et al. 2007; J.-S. Huang et al. 2008, in preparation). The m is set to be 0.03 for the simulated photometric redshift error and 0.0001 for the simulated spectroscopic redshift error. The  is set to be 0.02 in the photo-z simulation. The value of f is set to be 4%, and in our simulations, around 40% galaxies have successful spectroscopic redshift measurements. With these values, less than 6% of galaxies may have negative z in the -var process, and we simply take their absolute values. Figure 2 shows how the simulated photo-z looks for mock1, and its profile fits the photometric redshift performance from observations quite well. 3.2. Definition of Successful Detection In this paper we set Nc (eq. [3]) at 3. We use the largest group fraction ( LGF; see Marinoni et al. 2002; Gerke et al. 2005) to check whether a reconstructed group is a successful detection.

No. 2, 2008

PFOF ALGORITHM

1051

There are three free parameters in the PFOF algorithm: D0 , V0 , and Pth. The parameter space is much larger than the original FOF algorithm, and the optimization of the parameter Pth is more subtle than D0 and V0 . It is therefore difficult to decouple the effects of Pth and V0 . Our strategies to perform the optimization are: 1. Optimize D0 and V0 by applying the original FOF algorithm on a flux-limited mock catalog, where all the galaxies in the mock catalog have assumed spectroscopic redshifts. 2. Use D0 and V0 obtained from step 1 as the fixed parameters in PFOF, and optimize the value for Pth. In this stage, the mock catalog is the same one as used previously, but only a certain portion of galaxies in it have spectroscopic redshifts. 3. Adjust D0 , V0 , and Pth around the values obtained in the previous two steps to get the final optimal parameters.

Fig. 2.—Simulated photometric redshifts for mock1 based on the parameters described in x 3.1.3.

To understand the LGF, first it should be noted that, in a groupfinding result of a mock catalog, a reconstructed group is composed of field galaxies and galaxies presumably belonging to different real groups. Here the real groups refer to the galaxy groups provided by the N-body simulation, from which our mock catalog is constructed. If G is the set of all galaxies in a reconstructed group, and G 0 is a real group that contains a plurality of the galaxies in G, then the LGF (LG ) is defined as LG ¼

N (G \ G 0 ) ; N (G)

ð13Þ

where N(A) denotes the number of galaxies in the set A. In this paper we follow Gerke et al. (2005) to define a successfully reconstructed group as one that achieves LG  0:5, and the purity is the fraction of the successfully reconstructed groups out of all the reconstructed groups. The groups with richness less than 3 are filtered out from both the mock catalogs and the reconstructed group catalogs before the evaluation of LG . So our purity of N  4 reconstructed groups is not biased high by the abundant N ¼ 2 real groups. 3.3. Strategy for Optimization In this investigation, we carry out two different simulations for group finding: (1) fixing the linking length in comoving coordinates, and (2) fixing the linking length in physical coordinates (locally). More discussion on different choices of linking length is provided in x 5. In the following, our optimization strategy tends to focus on the performance of reconstructed groups with richness 5, rather than on smaller or larger groups. The reasons are (i) the groupfinding performance is very uncertain for smaller reconstructed groups with the redshift uncertainty used in our simulation; (ii) there are too few larger groups in our field of view, and therefore our analysis cannot provide satisfactory statistics; and (iii) it is easier for one to glue those fragmentary reconstructed groups back to a complete large group than to do the reverse.

The first step is to see how the separations of members in galaxy groups are characterized by the linking lengths. The second step is to simulate the effects of the redshift measurement errors for our algorithm with a mock catalog in which spectroscopic information is not always available. The last step is to take into account that in a catalog with larger photometric errors, the spatial regions occupied by members of different groups tend to be overlapped with each other. In order to avoid the overmerging of groups,11 it is necessary to set the D0 several kiloparsecs smaller; in our experience, in PFOF, setting V0 to be about 10–20 Mpc larger provides a better group-finding performance. The reasons for setting V0 to be larger are discussed in x 5.3.1. Note that, due to cosmic variance, there will be a range of reasonable values for all these physical size scales and measurement error parameters.12 4. RESULTS We optimize the PFOF and obtain the results from two of the 36 independent DEEP2 mock catalogs. The two mock catalogs have the same field of view but different numbers of galaxies and galaxy groups. For each mock catalog, we do 10 trials to examine the stability of the statistics of the group-finding algorithm and to see whether the simulated photometric redshift errors tend to retain most of the structures or to destroy most of the structures. In each trial, the standard deviation of the photometric redshift error, and the photometric redshift error itself for each single galaxy, is resimulated. That is, we generated 10 simulated photometric redshift galaxy catalogs for each of the two mock catalogs from the cosmological N-body simulation. We then find galaxy groups in those 10 simulated photometric redshift galaxy catalogs. The mean purities and the standard deviation of purities are then provided in Figures 3–5, where the standard deviation of the purities are shown as the error bars in the figures for our results. The differences of mean purities between the two mock catalogs are smaller than the error bars. Therefore, we show the results for the mock1 catalog only. We fix the linking lengths in physical coordinates and comoving coordinates, respectively, in order to find out which case provides the better group-finding results. In fixing the linking lengths, the D0 and V0 in equations (11) and (12) are independent of redshifts in their respective coordinates. 11 In the case where several independent groups are glued together by the group-finding algorithm. 12 The typical radius of a single galaxy is about 10 kpc. Hence the linking length D0 cannot be determined more precisely than this order of magnitude. For the same reason, in the line-of-sight direction, due to the velocity dispersion of galaxies (typically >300 km s1), V0 can only be determined within a precision of several megaparsecs.

1052

LIU ET AL.

Vol. 681

Fig. 3.—Purities and the number counts vs. the richness for the PFOF-reconstructed groups from mock1, in which 40% of the galaxies have successful spectroscopic redshifts, and the linking lengths are fixed in the comoving coordinates. Panels a–d show the purities for different PFOF parameters, where the horizontal axes are the richness, and the vertical axes are the purity for the reconstructed groups. Panels e–h show the number counts of the real and successfully reconstructed groups, where the horizontal axes are the richness, and the vertical axes are the number counts of the successfully reconstructed groups. The dotted lines in (e)–(h) show the number counts of the real groups, and the solid lines in them show the number counts of the successfully reconstructed groups. The parameters (D0 [kpc], V0 [Mpc], Pth) are (250, 20, 0.01) in (a) and (e), (250, 20, 0.00001) in (b) and ( f ), (250, 30, 0.01) in (c) and (g), and (250, 30, 0.00001) in (d ) and (h).

Table 1 lists the optimal D0 and V0 we obtained by FOF (step 1 in x 3.3). We use the optimized parameters from FOF as our first guess for PFOF. After varying these parameters around the values optimized from FOF, the optimal parameters for PFOF are obtained. Figure 3 shows the results of the simulation of the group finding by the PFOF algorithm, where the linking lengths are fixed in the comoving coordinates, and around 40% of the galaxies have successful spectroscopic redshift measurements. Figure 4 is similar to Figure 3, but no galaxies have successful spectroscopic redshift measurements. Figure 5 is also similar to Figure 3, but the linking lengths are fixed in the physical coordinates. After testing a broad range of parameters (V0 ¼ 10 50 Mpc for comoving coordinates cases; V0 ¼ 5 25 Mpc for physical coordinates cases, Pth from 0.1 to 0.000001), we found that when fixing linking lengths in the physical coordinates, D0  130 kpc and V0  8 13 Mpc optimize our group-finding results; in the comoving coordinates, D0  250 kpc and V0  20 30 Mpc optimize our group-finding results. The results of Pth ¼ 0:01 and 0.00001 are shown. Usually there is a trade-off between the purity and the recovering rate. Using the larger Pth, the reconstructed group catalog is supposed to have higher purity but recovers fewer real galaxy groups, and vice versa.

From our simulations in Figures 3–5, it is also verified that the small difference in the linking lengths, as in Figure 1, should not change the performance of the PFOF algorithm by much. Figure 6 shows some results of FOF, with various parameters for comparison. From Figure 3, we can see that, using the PFOF method, the purity of richness = 4, 5, 6, 7 groups can be higher than 70%. By comparing Figures 3 and 4, we can see that the purities are a little bit higher in the catalogs in which around 40% of the galaxies have successful spectroscopic redshifts as compared to the catalogs with no spectroscopic redshift measurements. But the PFOF algorithm recovers more groups in the catalogs with no spectroscopic measurements than in the catalogs with 40% spectroscopic redshifts measurements. By comparing Figures 3 and 4 with Figure 5, we find that fixing linking lengths in the comoving coordinates may give better purities than fixing linking lengths in the physical coordinates (at least in the parameter space that we have surveyed ). Figure 6 shows the results of FOF, where we use the same catalogs as used in Figure 3 (the same photo-z error, and 40% of the galaxies have successful spectro-z). It can be seen that the PFOF algorithm obtains significantly higher purities than the original FOF algorithm. The small error bars together with the low

No. 2, 2008

PFOF ALGORITHM

1053

Fig. 4.—Similar to Fig. 3, but no galaxies are targeted for the spectroscopic redshifts.

mean purity in the FOF results from multiple trials indicate substantial false identifications. The purities are greatly affected by the quality and property of the data and not only determined by the group-finding algorithm itself. For example, (1) the redshift error and (2) the survey depth of the galaxy catalog can affect the performance of the PFOF algorithm. The structures are smoothed by the redshift error, and the detection becomes more difficult when the survey is not deep or complete enough. Both effects make the group and field environment more and more indistinguishable. Hence, although we use DEEP2 mock catalogs, the statistics of our result should not be compared with previous DEEP2 groupfinding results (Marinoni et al. 2002; Gerke et al. 2005) directly, because they used only galaxies with spectroscopic detections, while we take all galaxies with R < 24:1. We choose the parameters D0 ¼ 250 kpc, V0 ¼ 20 Mpc fixed in comoving coordinates, and Pth ¼ 0:01 as our final optimal parameters. We provide a look-up table for the optimal parameters to convert the number of successful reconstructed groups to the number of real groups. Table 2 is the look-up table, and the gain ratio is the value ðgain ratioÞn ¼

(No: of real groups)n ; ( No: of successfully reconstructed groups)n

where the subscript n denotes the richness. The inverse of the gain ratio is typically what is referred to as completeness in cluster and group scale.

That is, if the number of reconstructed groups with richness = 5 is N5 , the purity for richness = 5 groups is f5 , and their gain ratio is r5 ; then the number of real groups with richness = 5 is (N5 ; f5 ; r5 ). If the DEEP2 mock catalogs are good models of the real world, the look-up table can be used to estimate the number of real groups from the number of reconstructed groups. 5. DISCUSSION Because we aim to use hybrid data with both photometric and spectroscopic redshifts, and we use the concept of probability, some considerations arise on the properties of the data and the output group catalog. There is always an issue for the quantization of the results when the samples in the data have different data qualities. In x 5.1 we briefly discuss and make some suggestions for the output of the quantities of the groups. Section 5.2 discusses the strategy of the selection of the complete sample when the samples are sparse. The concept of the PFOF algorithm is restated in x 5.3.1, and another possible way to construct the probability distribution function G1 , G2 in equation (9) is suggested, while x 5.3.2 briefly mentions the possibility of further modifying the PFOF algorithm. Another issue is that some samples are abandoned because of their poor quality. The data thus become incomplete and must be corrected. However, this incompleteness correction may be artificial and can bias the results. In x 5.4 we also suggest a possible alternative for the incompleteness correction while using the concept of probability.

1054

LIU ET AL.

Vol. 681

Fig. 5.—Similar to Fig. 3. Around 40% of the galaxies have successful spectroscopic redshifts, and the linking lengths are fixed in the physical coordinates. The parameters (D0 [kpc], V0 [Mpc], Pth) are (130, 8, 0.01) in (a) and (e), (130, 8, 0.00001) in (b) and ( f ), (130, 13, 0.01) in (c) and (g), and (130, 13, 0.00001) in (d ) and (h).

5.1. Relating PFOF to Physical Properties In our development of the group-finding algorithm, one of the most difficult tasks is to properly weight errors in the redshift of each galaxy while calculating their projected distance. In our work, we simply average their mean redshifts and calculate their projected comoving distance (or projected physical distance) according to their projected coordinates ( R.A., decl.). Although this conflicts with list point 1 in x 2, with our assumed redshift error, this problem only contributes a minor error to the calculation of projected distance, if the two galaxies satisfy the linking criteria in the line-of-sight direction (eq. [9]). After the identifications of galaxy groups, one should be able to give a quantitative description of the physical properties and the statistics of the galaxy groups, such as position, richness, color, TABLE 1 Optimized Linking Length for FOF Mock Catalogs

Parameter

D0 ( Mpc)

V0 ( Mpc)

com. mock1 ........................... phys. .......................................

0.25 0.14

12.0 4.5

Notes.—Without simulated redshift error; ‘‘com.’’ denotes fixing D0 and V0 in comoving coordinates; ‘‘phys.’’ denotes fixing D0 and V0 locally in physical coordinates. The original FOF algorithm is used to generate these optimized D0 and V0 values.

and correlation function. But how to weight the contributions of physical quantities of each group member requires further study. The weighting W of a galaxy i could be a function of the probability of grouping P, the redshift uncertainty of that galaxy zi , and the quantity that reflects the survey depth of the sample DL /D0 , or other quantities. For example, W could be the inverse of zi : Wi (zi ) 

1 ; zi

and the redshift of a galaxy group could be  i W i zi ;  i Wi where i sums over the members of the galaxy group and zi is the redshift of each member. Some modifications for the biweight method ( Lax 1985; Beers et al. 1990) may also be considered. 5.2. Selection of Complete Sample The data that cover a wide range in redshift can be used to study the evolution of structure. Unfortunately, in addition to the intrinsic evolution of the structures, the observational selection bias (e.g., limiting magnitude) also makes the population of the samples at lower redshifts different from the population at high redshifts. A way to eliminate the selection bias is to remove a part

No. 2, 2008

PFOF ALGORITHM

1055

Fig. 6.—Purities and the number counts vs. the richness for the FOF-reconstructed groups from mock1, in which 40% of the galaxies have successful spectroscopic redshifts, and the linking lengths are fixed in the comoving coordinates. The parameters (D0 [kpc], V0 [Mpc]) are (250, 6) in (a) and (e), (250, 12) in (b) and ( f ), (250, 20) in (c) and (g), and (250, 30) in (d ) and (h).

of the sample, such that the sample will have the same population along the redshift. Many statistical studies of galaxy properties are conducted based on complete samples, which are made of galaxies with luminosity or mass greater than a threshold from their original galaxy catalogs. However, the completeness of the sample can also affect the performance of the group-finding algorithms. The selection of complete sample before the identification of galaxy groups inevitably reduces the galaxy number density in the sample and hence reduces completeness of galaxy groups as well as purity in the sample. To identify galaxy group candidates for other follow-up observations, we therefore suggest a different strategy. TABLE 2 Look-up Table for Converting the Number of Successful Reconstructed Groups to the Number of Real Groups

We can first identify galaxy groups in the original galaxy catalog and then select the complete sample from those members of galaxy groups. The galaxy groups without enough members to fulfill the threshold for selecting the complete sample can then be abandoned. However, if one selects the complete sample after identifying galaxy groups, the group abundance and purity may have systematic bias with respect to redshift, and any statistical study as a function of redshift therefore cannot be used directly. Although this strategy biases the statistical results along the redshift direction, we can find more group candidates with higher purity. And with a good model that predicts the redshift dependence of selection effects, the group-finding results with higher overall purities and higher completeness could presumably be corrected to provide better constraints for the group properties. 5.3. More about the Probability

Richness

Gain Ratio

5.3.1. The Implementations of the Probability Distribution Function

3...................................................................... 4...................................................................... 5...................................................................... 6...................................................................... 7......................................................................

1:58þ0:17 0:14 1:54þ0:19 0:16 3:16þ1:2 0:68 2:19þ2:74 0:78 2:50þ7:5 1:07

In this section we clarify some concepts in the PFOF algorithm and explain our way to implement it on the data. This discussion essentially relates to the probability distribution function G1 and G2 in equation (9). While our simulation is specialized to one possible construction of G1 and G2 , the more general concept about the implementations of the probability distribution function is restated.

Note.—For richness larger than 7, we think that the error bar is so large that the result is unreliable.

1056

LIU ET AL.

Fig. 7.—Plot showing the spectroscopic redshift distribution of galaxies with simulated photometric redshift 1:02  photo-z  1:08 and 1:22  photo-z  1:28 in mock1.

The implementation of the PFOF algorithm has some flexibility because this algorithm aims at imperfect data. For some objects, their redshifts have to be constrained by the empirical methods (e.g., photometric redshift) or determined statistically. The way to constrain the redshifts should not be unique and depends on the currently accessible data. By using the photo-z codes to estimate the likelihood of a galaxy being at some redshift from the photometry, a plausible form of G1 and G2 in equation (9) can be constructed. However, if there are sufficiently complete samples of spectroscopic redshift detections, the spectroscopic redshift distribution of the galaxies within a certain photometric redshift value range is also a possible statistical construction of the probability distribution function. This make sense if the redshift distribution of the galaxies that we did not target for spectroscopic redshift does not differ substantially from the redshift distribution of our spectroscopic redshift sample. It becomes more clear if we look at Figure 2. A void can be seen within the redshift range 0:95 < z < 1:0. So when we see a galaxy with photometric redshift 0.98, we probably would not say it is most likely a z ¼ 0:98 galaxy, but rather determine its probability distribution function statistically. For example, Figure 7 shows the histogram of galaxies with simulated photometric redshift 1:02  photo-z  1:08 and 1:22  photo-z  1:28, respectively, in Figure 2 along the spectroscopic redshift direction. After normalizing the sum of the histogram to be one, and probably with some kind of smoothing, it can be used as the G1 and G2 in equation (9) Those narrow spikes in Figure 7 should correspond to some local overdensities. Because they are narrower than our 1  photometric redshift error (z ¼ 0:03), they show the potential to resolve the structures better than just by the photometric redshift itself. Our discussion above and Figure 7 also show that, on the size scale at which the galaxies cannot be viewed as uniformly distributed, the G1 and G2 obtained from the photometric redshift codes are offset from the most likely redshifts, and Gaussian shapes

Vol. 681

are the distorted distributions from the real redshift distributions. Although at large scale (several hundred megaparsecs), the galaxy density will not be disturbed much; locally, the average separation of galaxies in galaxy groups will be enlarged by this offset and distortion, such that we need to use larger linking lengths to recover the galaxy groups and group members in the PFOF. That is why we set the linking lengths 10 Mpc larger after optimizing them from the FOF algorithm in a mock catalog in which all the galaxies have successful spectroscopic redshifts. For the observations for which the available measurements of the spectroscopic redshifts are insufficient, or where the distribution of the measurements of the spectroscopic redshift is biased, the assumption of a uniform distribution for G1 and G2 may be acceptable. But if the spectroscopic redshift measurements are able to be a fair random sample of all the galaxies, such that the G1 and G2 can be confined better, then using the distribution functions obtained from the photometric redshift codes should only give us the lower limit of the performance of PFOF. That is, if we interpret the probability distribution function properly, such that the information contained in the data is used more efficiently, the performance of the PFOF algorithm could still be enhanced. Although, originally, the PFOF algorithm was developed for finding galaxy groups in a photometric redshift galaxy catalog in which the redshift errors have a large dispersion, it is actually not restricted to the information given by the photometric redshift programs. 5.3.2. The Consideration of Conditional Probability

In our work, we only consider the probability of physical association of two galaxies. It may be possible to further generalize this method to consider the conditional probability of physical association for more galaxies simultaneously. For example, consider the linking criterion      P zj  zi   VL jzj  zk   VL > Pth for galaxies i, j, and k. It may reduce the chance of identifying an artificial superstructure in the line-of-sight direction. 5.4. The Alternative for Incompleteness Correction When using a photometric redshift galaxy catalog, not only the detection limit but also the photometric redshift quality should be considered in correcting for the incompleteness of the data. As mentioned in x 3.1.3, galaxies appearing outside of the region occupied by the training set in the color-magnitude diagram will have unpredictable photometric redshift behaviors (Hsieh et al. 2005). The incompleteness issue for this population of galaxies is slightly different from what we mentioned in x 5.2. For this population, we are very sure that they are detected, and we know the number of these objects but are not able to constrain their redshifts even by an empirical method. To avoid the interference of those poorly determined photometric redshifts, one might remove them from the photometric redshift galaxy catalog while identifying galaxy groups, and then recover their distribution by applying weighting. However, in the photometry data, we still can determine the (R.A., decl.) coordinates for that population of galaxies accurately. Removing them from the galaxy catalog actually removes some useful information. The probability distribution function used in the PFOF method now provides an alternative to bypass the incompleteness problem and further improve the efficiency of using the data, without throwing out any useful data. Following are descriptions of how we face this problem.

No. 2, 2008

PFOF ALGORITHM

If each time we generate more than one group-finding result with the same linking lengths but different probability thresholds, then in the high-probability threshold result, those galaxies poorly determined in photometric redshifts will naturally be excluded in the group result; in the low-probability threshold result, those galaxies poorly determined in photometric redshift result will naturally be included. So this alternative describes the grouping of galaxies with the free parameter Pth , which characterizes our tolerance of the noise in the data.13 6. CONCLUDING REMARKS We develop a new algorithm to identify galaxy groups in a catalog in which the redshift error for the galaxies has a large dispersion. The probability of two galaxies being within the lineof-sight direction linking length VL is used to judge whether they are in the same group. Our concept can be further generalized to 13

The group-finding results change with the changing of the probability threshold Pth.

1057

associate data points in higher dimensional space, where more than 1 degree of freedom of those data points results in large and nonuniform uncertainty. In addition, we suggest an alternative to the incompleteness correction. The contamination from galaxies with large photometric redshift errors can be seen by changing the probability threshold Pth. We simulate the performance of the PFOF algorithm by the DEEP2 mock catalogs, which have a redshift range 0.6 –1.6, and include galaxies with apparent magnitudes of R < 24:1. In the mock catalogs, 40% of the galaxies have spectroscopic redshift measurements, and the photometric redshift errors are about 0.03. Our simulations suggest that the PFOF algorithm can give >70% purities for galaxy groups with richness 4–7.

We thank our referee for very instructive and useful suggestions. The lead author acknowledges funding from ASIAA and National Taiwan University.

REFERENCES Barnes, J. 1984, MNRAS, 208, 873 Li, I. H., & Yee, H. K. C. 2008, AJ, 135, 809 Beers, T. C., Flynn, K., & Gebhardt, K. 1990, AJ, 100, 32 Lilly, S. J., et al. 1995, ApJ, 455, 108 Berlind, A. A., et al. 2006, ApJS, 167, 1 Lin, H., et al. 1996, ApJ, 464, 60 Blaizot, J., et al. 2005, MNRAS, 360, 159 ———. 1999, ApJ, 518, 533 Bode, P. W., Cohn, H. N., & Lugger, P. M. 1993, ApJ, 416, 17 Longo, G., et al. 1994, A&A, 282, 418 Bolzonella, M., Miralles, J. M., & Pell’o, R. 2000, A&A, 363, 476 Marinoni, C., et al. 2002, ApJ, 580, 122 Botzler, C. S., et al. 2004, MNRAS, 349, 425 Mendes de Oliveira, C., & Hickson, P. 1994, ApJ, 427, 684 Coil, A. L., et al. 2004, ApJ, 617, 765 Miller, C. J., et al. 2005, AJ, 130, 968 Colless, M., et al. 2001, MNRAS, 328, 1039 Mobasher, B., et al. 2007, ApJS, 172, 117 Collister, A. A., & Lahav, O. 2004, 2004, PASP, 116, 345 Monaghan, J. J. 2005, Rep. Prog. Phys., 68, 1703 Connolly, A. J., Csabai, I., & Szalay, A. S. 1995, AJ, 110, 2655 Nichol, R. C. 2004, in Clusters of Galaxies: Probes of Cosmological StrucCouchman, H. M. P., Thomas, P. A., & Pearce, F. R. 1995, ApJ, 452, 797 ture and Galaxy Formation, ed. J. S. Mulchaey, A. Dressler, & A. Oemler Davis, M., et al. 2003, Proc. SPIE, 4834, 161 (Cambridge: Cambridge Univ. Press), 24 De Lucia, G., & Blaizot, J. 2007, MNRAS, 375, 2 Rubin, V. C., Ford, W. K. J., & Hunter, D. A. 1991, ApJS, 76, 153 Diaferio, A., Geller, M. J., & Ramella, M. 1994, AJ, 107, 868 Springel, V. 2005, MNRAS, 364, 1105 Dye, S., et al. 2006, MNRAS, 372, 1227 Springel, V., & Hernquist, L. 2002, MNRAS, 333, 649 Efstathiou, G. 1988, MNRAS, 232, 431 Springel, V., Yoshida, N., & White, S. D. M. 2001, NewA, 6, 79 Eke, V. R., et al. 2004, MNRAS, 348, 866 Springel, V., et al. 2001, MNRAS, 328, 726 Gal, R. R., et al. 2003, AJ, 125, 2064 van Breukelen, C., et al. 2006, MNRAS, 373, L26 Gavernato, F., Tozzi, P., & Cavaliere, A. 1996, ApJ, 458, 18 van den Bosch, F. C., Mo, H. J., & Yang, X. 2003a, MNRAS, 345, 923 Geller, M. J., & Huchra, J. P. 1983, ApJS, 52, 61 van den Bosch, F. C., Yang, X., & Mo, H. J. 2003b, MNRAS, 340, 771 Gerke, B. F., et al. 2005, ApJ, 625, 6 van den Bosch, F. C., et al. 2004, MNRAS, 352, 1302 Giovanelli, R., & Haynes, M. D. 1991, ARA&A, 29, 499 ———. 2007, MNRAS, 376, 841 Gladders, M. M., & Yee, H. K. C. 2000, AJ, 120, 2148 van Dokkum, P. G., & Stanford, S. A. 2003, ApJ, 585, 78 ———. 2005, ApJS, 157, 1 Vanzella, E., et al. 2004, A&A, 423, 761 Hsieh, B. C., et al. 2005, ApJS, 158, 161 Warren, S. J., et al. 2007, MNRAS, 375, 213 Huchra, J. P., & Geller, M. J. 1982, ApJ, 257, 423 White, M. 2002, ApJS, 143, 241 Ilbert, O., et al. 2006, A&A, 457, 841 White, M., Hernquist, L., & Springel, V. 2002, ApJ, 579, 16 Kitzbichler, M. G., & White, S. D. M. 2007, MNRAS, 376, 2 Yan, R., Madgwick, D. S., & White, M. 2003, ApJ, 598, 848 Koester, B. P., et al. 2007a, ApJ, 660, 221 Yan, R., White, M., & Coil, A. L. 2004, ApJ, 607, 739 ———. 2007b, ApJ, 660, 239 Yang, X., Mo, H. J., & van den Bosch, F. C. 2003, MNRAS, 339, 1057 Lax, D. A. 1985, J. Am. Stat. Assoc., 80, 736 Yang, X., et al. 2005, MNRAS, 356, 1293 Le Fev`re, O., et al. 2005, A&A, 439, 845

Suggest Documents