Quality Control of Laboratory Methods for ... - Wiley Online Library

16 downloads 34976 Views 111KB Size Report
developed, and technicians from each study site attended a training session at the .... men were aliquoted randomly into coded dilution vials (Auto- analyzer ...
Journal of Andrology, Vol. 25, No. 4, July/August 2004 Copyright q American Society of Andrology

Quality Control of Laboratory Methods for Semen Evaluation in a Multicenter Research Study CHARLENE BRAZIL,* SHANNA H. SWAN,† CHARLENE R. TOLLNER,* CATHY TREECE,* ERMA Z. DROBNIS,‡ CHRISTINA WANG,§ J. BRUCE REDMON,\ JAMES W. OVERSTREET,* AND THE STUDY FOR FUTURE FAMILIES RESEARCH GROUP From the *Division of Reproductive Biology, Department of Obstetrics and Gynecology, University of California, Davis, California; the †Department of Family and Community Medicine and the ‡Department of Obstetrics and Gynecology, University of Missouri-Columbia, Columbia, Missouri; the §Department of Medicine, Harbor-UCLA Medical Center, Los Angeles, California; and the \Departments of Medicine and Urologic Surgery, University of Minnesota, Minneapolis, Minnesota.

ABSTRACT: Rigorously standardized laboratory protocols and strict quality control (QC) are essential for meaningful comparisons between semen quality data from multiple sites. We describe our experience with the Study for Future Families (SFF), a multicenter study of semen quality in the United States. Detailed protocols were developed, and technicians from each study site attended a training session at the central laboratory. Technicians received blinded replicates from diluted semen specimens for counting by MicroCell and hemacytometer. Sperm motility was assessed using videotaped recordings for simple percent motility and categorical assessment of individual sperm progression as recommended by the World Health Organization (WHO). The mean intertechnician coefficient of variation for individual specimens was 12.6% for MicroCell counts, 15.2% for hemacytometer counts, and 10.5% for percent motility. Intratechnician coefficients of variation averaged 10.3% for MicroCell

counts, 12.5% for hemacytometer counts, and 5.2% for percent motility. The average percent differences between the technicians’ values and the central standard for individual specimens were 13.5%, 16.6%, and 11.9% for MicroCell counts, hemacytometer counts, and simple percent motility, respectively. We achieved our goal of maintaining mean intratechnician coefficients of variation and mean percent differences from the standard values of 15% or less for measurements of simple percent motility and sperm concentration by MicroCell. Standardization using the Improved Neubauer hemacytometer chamber proved more difficult. We were not successful in standardizing a method for categorical assessment of individual sperm progression. Key words: Sperm concentration, semem volume, sperm motility, quality control, observer variation, precision. J Androl 2004;25:645–656

number of investigators have reported on quality control (QC) procedures for the andrology laboratory (Mortimer et al, 1986; Dunphy et al, 1989; Knuth et al, 1989; Cooper et al, 1992; Mortimer, 1994; Clements et al, 1995). Two approaches have been used for external or interlaboratory QC. In the first approach, a central laboratory distributes QC samples to the study sites for proficiency testing of the andrology technicians (Neuwinger et al, 1990; Walker, 1992; Matson, 1995; Cooper et al, 1999; Gandini et al, 2000). Samples used for QC of sperm

concentration determinations have included small vials of preserved semen (Jørgensen et al, 2001), suspensions of fixed, washed sperm (Neuwinger et al, 1990), and frozen aliquots of pooled semen (Gandini et al, 2000). Materials used to assess variation among laboratories in sperm motility assessment have included thawed, cryopreserved sperm (Neuwinger et al, 1990; Cooper et al, 1992; Muller, 1992; Clements et al, 1995) and videotapes of semen samples (Cooper et al, 1999; Gandini et al, 2000). Another approach used for interlaboratory QC has involved collaborating groups meeting in a central location for QC testing (Jequier and Ukombe, 1983; Jørgensen et al, 1997; Auger et al, 2000). A combination of these approaches has also been used (Jørgensen et al, 2001). Most studies have used the intertechnician (or interlaboratory) coefficient of variation to measure variability among technicians. Coefficients of variation reported for sperm concentration, percent sperm motility, and percent normal sperm morphology have been large and variable. Neuwinger et al (1990) reported interlaboratory coefficients of variation of 23%–73% for sperm concentration, 9%–37% for percent sperm motility, and 25%–87% for

A

This work was supported by grant R01 ES09916 to the University of Missouri from the National Institutes for Environmental Health Sciences (NIEHS), National Institutes of Health; grant MO1-RR00400 to the University of Minnesota General Clinical Research Center; and grant MO1RR0425 to the Research and Education Institute at the Harbor-UCLA Medical Center and the Cedars-Sinai Research Institute from the National Center for Research Resources, National Institutes of Health. Correspondence to: Dr James W. Overstreet, Center for Health and the Environment, University of California, One Shields Ave, Davis, CA 95616 (e-mail: [email protected]). Received for publication January 20, 2004; accepted for publication March 2, 2004.

645

646 percent normal sperm morphology. Gandini et al (2000) reported on results from 20 laboratories, with interlaboratory coefficients of variation for sperm concentration that ranged from 30% to 52%, coefficients of variation for percent motility that ranged from 39% to 72%, and coefficients of variation for percent normal morphology that ranged from 17% to 26%. Better QC results were reported when technicians from different laboratories were tested at a central location using fresh semen samples (Jørgensen et al, 1997; Auger et al, 2000). When 12 technicians were assessed in this manner, the mean intertechnician coefficients of variation were 22.9% and 21.8% for sperm concentration and percent motility, respectively (Auger et al, 2000). Because of the complexities and logistics of working with semen, it is virtually impossible for any QC exercise to mimic the actual at-bench experience for the technician. Ideally, the QC material should consist of blinded samples, unrecognized as QC by the technician, analyzed at the normal pace and intensity of the daily routine and representative of the range of specimens normally processed by the laboratory. In reality, choices must be made on the basis of what approach is practical, which components of the techniques are most likely to cause variability, and how thoroughly the entire process of semen evaluation can be assessed. Because the QC material cannot be disguised as a routine semen specimen, the technician may, unavoidably, give QC samples special treatment, and coefficients of variation obtained in QC exercises may underestimate the actual variability. However, the QC process itself introduces new variables that may increase the coefficient of variation (ie, sampling errors by the central laboratory or sperm alterations such as clumping in the fixed QC suspension). In this communication, we report on our experience in applying QC procedures in a multicenter study of semen quality, the Study for Future Families (SFF) (Swan et al, 2003), conducted in 4 cities in the United States.

Materials and Methods

Journal of Andrology · July/August 2004 mended by the World Health Organization (WHO, 1999) as well as with a more simple technique. Every technician was trained at the central laboratory during a weeklong training session, regardless of prior experience. Each technique was demonstrated and discussed and then performed by the trainees with one-on-one direction. All techniques were practiced repeatedly throughout the week, and both casual and blind comparisons were made between the trainees’ results and those of the trainers. Laboratory protocols and data collection forms developed by the central laboratory were distributed to all centers and used during training. These protocols and forms required complete raw data collection; this allowed the central laboratory to check for protocol compliance as well as to recheck calculations. Proficiency testing involved mailing standardized specimens to the study sites and permitted central monitoring and remedial action. The semen donors for the central laboratory were healthy young volunteers who gave written informed consent. Central laboratory activities were approved by the Human Subjects Review Committee of the University of California, Davis. The QC program described included assessment of sperm concentration and percent motility by technicians at the 4 study sites. The QC program was based on laboratory methods previously developed by the central laboratory (Overstreet and Brazil, 1997) and was refined in subsequent multicenter investigations (Guzick et al, 1999, 2001; Overstreet et al, 1999). The QC procedures for sperm morphology assessment, which was determined at the central laboratory, have been described previously (Guzick et al, 2001). Seven technicians from 4 sites were trained for this study and contributed to the QC data presented. Of the 4 sites, 1 site had 2 trained technicians who participated in both study subject data collection and QC activities for the entire study; results from these 2 technicians (technicians 1 and 2) were presented separately in the data analysis. Two other sites each had 1 technician who collected all study subject and QC data for the length of the study (technicians 3 and 4). The last site had a total of 3 technicians (nonoverlapping) during the course of the study, although only one of them actually collected study subject data. For any given QC testing date, only 1 of the 3 technicians from this site participated in QC data collection. To allow the comparison of site-to-site QC summary data during the course of the study, the data from these 3 technicians were combined (technician 5).

Certification of Laboratory Technicians Technician Training In the SFF, a central laboratory at the University of California, Davis had responsibility for standardizing supplies and equipment, writing protocols and procedures for a standardized semen analysis, training laboratory technicians from each center, reviewing all semen evaluation data generated, and administering regular proficiency testing. The details of the laboratory protocols, supplies, and equipment have been described in another publication (Brazil et al, 2004). Briefly, concentration and motility were both determined 2 ways for each semen specimen; concentration was assessed by MicroCell and hemacytometer, and motility was assessed by the categorical method recom-

Following their return to the study sites after training, each technician completed a written test that evaluated his or her knowledge of study goals and laboratory procedures. Questions were designed to test understanding of details of each procedure to be performed during the semen evaluation as well as an overall appreciation of the general study objectives. In addition, technicians were asked to perform a number of calculations required for the semen evaluation, including the types of calculations that often lead to errors in decimal placement and/or data recording. At their home site, technicians practiced semen evaluations on 5–10 pilot subjects, using the study protocols and reporting forms. These semen evaluation (SE) forms, as well as morphol-

Brazil et al · Quality Control of Semen Evaluation ogy smears and videotapes of sperm motility, were sent to the central laboratory for review. If these materials were acceptable, the technician was sent a package of QC materials for proficiency testing. If results from this proficiency testing met the study goals (see below), the technician was ‘‘certified’’ to collect data for the SFF.

Proficiency Testing The SFF goals were for each technician to 1) maintain a mean percent difference from the standard value of 15% or less, and 2) maintain a mean intratechnician coefficient of variation of 15% or less for each technique (see below). The central laboratory distributed QC materials approximately every 3 months. QC dates were chosen in advance, and all technicians were encouraged to schedule blocks of time, as they would normally do for a study subject’s semen analysis, to analyze the QC materials without disturbance. In the SFF, sperm concentration was determined first using a MicroCell chamber and then again using an Improved Neubauer Phase hemacytometer chamber (Brazil et al, 2004). QC materials included preserved semen for assessing sperm concentration by these 2 counting methods as well as a videotape to be used to score sperm motility by 2 methods: a simple count of percent motility (Brazil et al, 2004) and a method that assigned individual motile sperm to categories on the basis of progression (WHO, 1999). Protocols and data recording sheets for analysis of the QC samples were included in each shipment. All materials were randomly assigned to technicians at the study sites and central laboratory and were shipped to the study sites by overnight mail. QC materials were either distributed to the technicians in the central laboratory by overnight mail to mimic the exposures of the packages going to the sites, or the packages were held at room temperature and then delivered directly to the technicians at the approximate time the sites received their shipments. Sperm Concentration—QC samples for determination of sperm concentration were prepared by diluting semen in a preservative solution (to 100 mL of distilled water, add 1 g of bovine serum albumin [A4503; Sigma Chemical Co, St Louis, Mo], 2 g of polyvinyl-pyrrolidone [PVP-40; Sigma], 0.9 g of NaCl [S5886; Sigma], 0.1 mL of Tritox X-100 [T-8787; Sigma], 0.004 mL of Silicone Antifoam [A-6425; Sigma], and 0.1 g of Sodium Azide [S-2002; Sigma]). Semen was diluted 1 part semen : 1 part diluent. One hundred microliters of this diluted semen were aliquoted randomly into coded dilution vials (Autoanalyzer cups, cat. no. 02-544-2; Fisher Scientific, Pittsburgh, Pa). The diluted semen was mixed continuously throughout the aliquoting process. From 4–6 semen specimens were used for each testing date. In most cases, the specimens were used as individual ejaculates, but occasionally, ejaculates were combined in order to obtain sufficient volumes for producing the QC samples. The semen specimens for each testing date were chosen to include a range of concentrations between 10 3 106/mL and 200 3 106/mL. Each of the 4 semen specimens contributed 8 coded vials of diluted semen for each study technician: 4 for MicroCell counts and 4 for hemacytometer counts. The 4 replicates mimicked the 2 or 3 replicate counts performed for each counting technique for the study semen evaluations (Brazil et al, 2004). The coded vials from each specimen were capped and then

647 sealed with Parafilm and randomly assigned to 2 sets of boxes: 1 set for counting by the MicroCell chamber and 1 set for counting with the hemacytometer. Boxes were randomly assigned to the study technicians and the 2 central laboratory technicians. The 2 central laboratory technicians were not involved in the coding or aliquoting of QC samples. To minimize the possibility that technicians would recognize a pattern of 4 specimens replicated 4 times each, on some occasions, a fifth specimen was sent, or the number of replicates of 1 specimen was changed to 3 or 5. Once received at the study site, the coded samples for use with the MicroCell were counted without further dilution, and those for use with the hemacytometer were additionally diluted 1:10 by the technician before counting. These dilutions matched those used during semen evaluations for the SFF study. Technicians were asked to complete counts within 48 hours of receipt and to report the raw data and the calculated concentrations for each of the 32 vials to the central laboratory. In the central laboratory, results were decoded, and each technician’s final concentration value for a specimen, for a given counting chamber, was taken to be the mean of the 4 blinded replicates for that specimen. This value was then compared to the ‘‘standard value,’’ which was defined as the average of the values from the 2 central laboratory technicians. A percent difference from the standard was calculated for each semen specimen; a mean percent difference (average of the absolute percent difference values for individual semen specimens) was calculated for each technician for each testing date. Additionally, the average concentration for all vials scored by a technician for each counting chamber on a given testing date was calculated to detect general tendencies toward over- or underestimating concentrations with respect to the central laboratory standards. To measure an individual technician’s variability, or precision, the central laboratory determined the intratechnician coefficient of variation from the blinded replicates of each sample for each counting technique. The study goal was for each technician to maintain a mean percent difference from the standard of 15% or less and a mean intratechnician coefficient of variation of 15% or less for each testing date. As part of each quarterly QC mailing, an assessment was made of the accuracy of the pipettors used for dilutions at each site. Each technician was asked to measure and weigh replicate aliquots of water with the same positive displacement pipettors and the same volume settings as those used for the concentration dilutions in the study. These results (not shown) were sent to the central laboratory and provided assurance that pipettors were appropriately maintained and calibrated throughout the study. Sperm Motility—To evaluate proficiency in sperm motility assessment, 2 master videotapes were prepared and duplicated by the central laboratory. Tape 1 was used for the first 4 quarterly QC mailings as well as for the seventh and eighth mailings. Tape 2 was used for the fifth and sixth mailings. Each videotape contained replicate images from a total of 7 (tape 1) or 5 (tape 2) different semen specimens. For each semen specimen, 10 representative fields of the semen were recorded. These same 10 fields were always used together to represent the specimen and simulated the random fields sampled by technicians while performing a motility assessment on a study specimen (Brazil et al, 2004). The 10 fields for each specimen were repeated (together)

Journal of Andrology · July/August 2004

648 4 times randomly on the videotape, and each time, they were coded as a different specimen. During videotaping of each specimen, care was taken to avoid the inclusion of any cells or debris in any of the fields that would allow the specimen to be easily recognized or identified on the videotape. The videotapes had either 28 ‘‘unknown’’ specimens (tape 1) or 20 ‘‘unknown’’ specimens (tape 2) (ie, 5 or 7 semen specimens repeated 4 times each for a total of 280 or 200 separate fields of semen on videotapes 1 and 2, respectively). Before initial distribution, the master videotapes were analyzed blindly in the central laboratory by 3 or 4 experienced technicians, and the mean values from these initial analyses were used as the standard values throughout the study. This allowed an ongoing monitoring of the standard technicians as well as the study technicians, an important consideration given that the standard technicians theoretically could be training new study technicians any time during the course of the study. To evaluate percent motility from the videotape, an acetate overlay of a reticle grid image was placed over the video monitor. This image mimicked the grid ocular in the microscope used during motility analysis of the live SFF semen specimens (Brazil et al, 2004) and was of the same relative magnification as that seen through the eyepiece reticle. Simple percent motility was analyzed according to study protocols. The portion of the grid analyzed depended on the concentration (Brazil et al, 2004). The raw values of motile and nonmotile sperm in each field were scored and recorded. The technician also assigned a progression score of 1–4 to each specimen using the same scoring system used during the SFF evaluations (Brazil et al, 2004). At the end of the analysis, the technician calculated the percent motility for each specimen using the data gathered from each of the 10 fields. Categorical motility analysis (WHO, 1999) was performed similarly, using the acetate overlay and separately scoring the 4 categories of sperm progression. Following analysis of the 10 fields, the percentage of sperm in each category was determined from the overall totals of sperm scored. Similar to the concentration results, the motility results were decoded upon receipt by the central laboratory, and a mean value for each specimen was determined. Additionally, mean intratechnician coefficients of variation and mean percent differences from the standard values were calculated for each technician and, as for concentration determinations, the study goal was for both intratechnician coefficient of variation and mean percent difference from the standard value to be 15% or less. Corrective Action—If the mean intratechnician coefficient of variation or percent difference from the standard value for sperm concentration or simple percent motility was greater than 15%, then the technician was considered ‘‘out of range,’’ and corrective action was initiated by the central laboratory. This process began with telephone consultations and a protocol review to identify the potential causes of the discrepancy. Consideration was given to whether a technician was consistently high or low in counting or motility scoring, and he or she was counseled accordingly. Particular emphasis was given to those procedures most likely to be sources of variability: mixing semen, vortexing diluted semen, immediately removing aliquots after mixing or vortexing, locating areas for counting in the MicroCell, scoring sperm on the outside grid lines for the hemacytometer, and gen-

erally identifying sperm for counting. Problems with motility analysis were generally solved by encouraging the technician to analyze a very small section of the grid at one time. Technicians were advised on the importance of not waiting for motile sperm to enter the grid to begin the analysis and, likewise, the importance of freezing the image in their minds in order to eliminate analyzing sperm that were in the grid as well as sperm that swam into the grid during the analysis time. An occasional high coefficient of variation or percent difference for 1 specimen was not considered cause for corrective action. Once technicians were trained and certified, problems with simple motility or MicroCell assessments were unusual, though it was not uncommon for hemacytometer values to be in the 15%–20% difference range.

Audit of Materials Produced at Study Sites Every 3 weeks, technicians at the study sites shipped all accumulated SE forms, morphology smears, and videotapes to the central laboratory. Every SE form was checked to ensure complete data collection, data recording, and legibility. All calculations for sperm counts, percent differences, and means or medians were recalculated by the central laboratory. Any errors found were corrected on the SE forms, and these corrected forms were copied and sent to the data-coordinating center for entry into the main database. Additionally, errors or problems were discussed with individual technicians. The time elapsed from the semen collection to the beginning of the analysis, as well as the actual time the technician took to complete the analysis, was determined.

Results QC at the Initiation of the Study In the pilot phase of the study, problems were identified and corrected by the central laboratory. Initial problems included empty or inappropriate use of data fields, missing abstinence times, missing raw data for counts, inaccurate calculations or rounding errors, inappropriate slide labeling, incorrect stage warmer temperatures, and illegible writing. With intensive feedback from the central laboratory, the incidence of these problems dropped significantly. Efforts were made to standardize the time from semen collection to the beginning of evaluation (30–45 minutes), as well as the time required to complete the semen evaluation at all sites (,1 hour), so that the motility assessments and videotaping would be performed at approximately the same time after semen collection at all sites. As seen in Figure 1, there were initially large differences among sites in the time between semen collection and the beginning of the analysis, as well as in the time required to complete the evaluation. Changes in workplace environment were made at sites 2 and 4, where analysis time subsequently decreased, and time from semen collection to beginning of analysis was also shortened (Figure 1).

Brazil et al · Quality Control of Semen Evaluation

649

Figure 1. Time from semen collection to completion of semen analysis for technicians at 4 study sites. The left side shows data from the initial phase of the study, and the right side shows cumulative data from the first 509 semen evaluations.

In the first QC assessment following initial technician training, all technicians’ reported values for sperm concentration by MicroCell technique and simple percent motility were within 15% of the standard value of the central laboratory. Additionally, all technicians had a coefficient of variation for repeated measurements (intratechnician coefficient of variation) of 15% or less for all sperm measures. At this point, the technicians were considered certified, and they began collecting data on study subjects for analysis in the SFF, despite the fact that initial QC results for sperm counts with the hemacytometer technique were not all within 15% of the standard value of the central laboratory. Ancillary studies in the central laboratory suggested that the QC materials had contributed to the variability in hemacytometer results in the first 2 QC assessments (data not shown). For the first 2 QC dates, the semen was diluted 1:20 (1 1 19) for hemacytometer counts, so that no additional dilution was performed by the technician at the study site. It was concluded that at a 1:20 dilution, the viscosity of the preservative solution interfered with sperm settling on the hemacytometer grid, increasing variability in counting. For all subsequent QC dates, central laboratory dilutions were decreased to 1 part semen : 1 part diluent. After receipt at the study site, the technician then performed an additional 1:10 (1 1 9) dilution in water before counting. Although this change led to some decrease in intertechnician variability, throughout the study, QC results for hemacytometer counts continued to be more variable than those for MicroCell counts, and they occasionally exceeded the goal of 15% difference from the standard.

QC During Progress of the Study QC materials were distributed 8 times, at approximately 3-month intervals, between May 1999 and January 2002.

Figure 2. Sperm concentrations obtained for quality control (QC) suspensions by 5 different technicians on 8 testing dates. Each data point represents the mean value for a given technician for a testing date. The mean value is the average of the 4 semen samples evaluated on a particular date and therefore should not be equal from one testing date to the other. (A) Shows concentrations determined with MicroCell chambers and (B) shows counts determined with hemacytometer chambers.

On average, the technicians at each study site spent 8–10 hours analyzing each set of QC materials. During the 2year period, 9 technicians (7 from study sites and 2 from the central laboratory) determined the concentrations of 128 unknown sperm suspensions for each counting technique produced from approximately 32 different ejaculates. In addition, 208 videotaped images (each containing 10 fields of semen) produced from 12 different semen samples were analyzed for motility. The overall mean values for all of these specimens scored during the course of the study are shown for each technician in Table 1. When intertechnician coefficients of variation for the SFF technicians were calculated from the mean concentration values shown in Table 1, they were 5.6% for MicroCell counts and 8.2% for hemacytometer counts. Figure 2A shows mean MicroCell values for each technician from

Journal of Andrology · July/August 2004

650

Table 1. Overall mean values for semen parameters of all the QC samples analyzed during the course of the 2-year SFF study*†

Technician Standard 1 Standard 2 Technician 1 Technician 2 Technician 3 Technician 4 Technician 5 Mean, all techs Mean intertechnician CV, study site technicians only Mean intertechnician CV, all technicians, including standards

Sperm Concentration by MicroCell (3106/mL)

Sperm Concentration by Hemacytometer (3106/mL)

Simple Percent Motility (%)

Progression Score

Motility Category ‘‘a’’ (%)

Motility Category ‘‘a 1 b 1 c’’ (%)

61.7 61.6 62.2 63.2 56.4 55.9 59.4

85.1 77.5 86.8 74.9 72.6 74.6 71.2

49 52 46 52 47 49 50

2.4 2.3 2.9 2.7 3.0 2.9 2.7

22 26 35 42 36 43 46

48 53 48 52 47 51 54

60.1

77.5

49.3

2.7

35.7

50.4

4.7%

11.7%

5.7%

25.1%

5.5%

5.6%

8.2%

4.9%

4.8%

7.9%

4.6%

9.8%

* CV indicates coefficient of variation; QC, quality control; and SFF, Study for Future Families. † N 5 128 sperm counts for each chamber; n 5 168 videotaped images analyzed for motility. Intertechnician CVs are determined from the mean values in the table. Motility results are shown for the 6 testing dates on which the same videotape of 28 specimens (280 fields) was used.

the 16 blinded replicates for each testing date. Figure 2B shows similar information for hemacytometer values. In general, the standard value obtained by the central laboratory technicians was in the mid range of these values. Note that each testing date has data from different semen specimens, and no attempt was made to standardize the expected mean from testing date to testing date; therefore, data should not be linear between testing dates. Figure 3 shows the mean values for simple motility assessments for 6 testing dates. Since all 6 testing dates shown used

Figure 3. Percent motility values obtained from quality control (QC) videotapes by 5 technicians on 6 testing dates. Each data point represents the mean value for a given technician for a testing date. Dates 1–6 used the same videotape containing the same 280 fields of semen. Ideally, motility values from date to date should be a straight line. The tape standard value, which was set at the beginning of the study, is constant.

the same videotape (the same 280 fields of semen, though labeled as a new videotape each time), ideally, these mean values for a given technician should be linear over time. For motility assessments, the intertechnician coefficient of variation determined from mean values shown in Table 1 was 4.9% for simple percent motility. Intertechnician coefficients of variation for categorical motility were 11.7% and 5.7% for categories ‘‘a’’ and ‘‘a plus b plus c,’’ respectively. Intertechnician values for all technicians (including study technicians and central standard technicians) are shown in the last row of Table 1. Intertechnician Coefficient of Variation for Individual Specimens—The range of intertechnician coefficients of variation for the 32 individual QC specimens scored by MicroCell counting was 3.9%–17.8%, with a mean coefficient of variation of 12.6% (Table 2). Similarly, intertechnician coefficients of variation determined from the same specimens using the hemacytometer chamber ranged from 7.5% to 26.4% and averaged 15.2%. The range of intertechnician coefficients of variation for the videotaped specimens scored for simple percent motility was 2.9%–33.1%, with a mean of 10.5%. Coefficients of variation determined from categorical motility assessments of the same videotaped specimens are shown in Table 2 for ‘‘a’’ motility, as well as for the collapsed values for ‘‘a plus b plus c’’ motility. Because of the difficulties encountered standardizing the ‘‘a’’ and ‘‘b’’ motility assignments (see ‘‘Discussion’’), Table 2 shows ‘‘a’’ motility coefficients of variation for the study technicians alone as well as for all technicians, including the central standards. Table 2 also shows intertechnician co-

Brazil et al · Quality Control of Semen Evaluation

651

Table 2. Intertechnician coefficients of variation for individual semen specimens; the data were obtained from study technicians and central laboratory technicians, except as noted* Testing Material 33 semen specimens

Same 33 semen specimens

Videotaped images of 12 semen specimens Same 12 videotape specimens Same 12 videotaped specimens Same 12 videotaped specimens

Same 12 videotaped specimens

during the course of the study. In general, technicians were more precise when using the MicroCell than when using the hemacytometer for concentration determinations. Additionally, technicians’ mean precision was greater when scoring simple motility than when trying to score either category ‘‘a’’ alone or retrospectively combining ‘‘a,’’ ‘‘b,’’ and ‘‘c’’ categories. Table 3 also shows progression estimates used in conjunction with simple motility evaluations. The last row of Table 3 shows mean intratechnician coefficients of variation for all technicians, including both study technicians and central standard technicians. Percent Difference From the Standards—Table 4 shows the mean percent differences from the standard value for semen parameters of the individual ejaculates analyzed for QC. In general, technicians exhibited higher differences from the standard for sperm concentration determination by hemacytometer chamber than by MicroCell chamber. Table 4 also shows the effect of changing the method for preparation of QC samples for hemacytometer counting. The third column shows the data for all testing dates combined, and the fourth column shows the data for all dates except for the first 2, after which the central laboratory changed the QC sample preparation method. For all technicians, the percent difference from the standard decreased after the new method was introduced; however, the percent difference values for the hemacytometer continued to be higher than the percent difference values for the MicroCell. Difference Between MicroCell and Hemacytometer Counts—Table 5 shows the percent differences between MicroCell counts and hemacytometer counts by technician for both QC data presented in this study, as well as for study subject data presented elsewhere (Brazil et al, 2004). Some technicians had nearly the same relative per-

Technique MicroCell counts CV range, 3.9%–17.8%† Mean CV 12.6%‡ Hemacytometer counts CV range, 7.5%–26.4% Mean CV, 15.2% Simple percent motility CV range, 2.9%–33.1% Mean CV, 10.5% Progression CV range, 6.0%–26.7% Mean CV, 13.6% ‘‘a 1 b 1 c’’ motility CV range, 3.9%–36.9% Mean CV, 11.1% ‘‘a’’ motility only (excluding standard values) CV range, 4.5%–80.8% Mean CV, 23.0% ‘‘a’’ motility only (including standard values) CV range, 4.5%–83.9% Mean CV, 40.3%

* CV indicates coefficient of variation. † Range of intertechnician CVs for individual semen specimens. ‡ Mean intertechnician CV for all semen specimens.

efficients of variation for progression estimates used with the simple motility scoring. Intratechnician Coefficient of Variation for Individual Specimens—The data in Table 3 show the mean intratechnician coefficients of variation for each of the sperm concentration and motility determinations reported by the 7 technicians. These mean values were determined from all blinded QC replicates scored for individual specimens

Table 3. Mean intratechnician coefficients of variation for individual technicians, as determined from blinded replicates analyzed in all QC concentration samples and videotaped motility images*

Technician Standard 1 Standard 2 Technician 1 Technician 2 Technician 3 Technician 4 Technician 5 Mean intratechnician CV, SFF study technicians only Mean intratechnician CV, all technicians, including standards

MicroCell (%)

Hemacytometer (%)

Motility (%)

Progression (%)

WHO ‘‘a’’ (%)

WHO ‘‘a 1 b’’ (%)

WHO ‘‘a 1 b 1 c’’ (%)

9.8 9.3 12.6 9.9 8.4 10.9 11.5

10.1 10.0 10.6 12.6 11.9 17.4 15.1

4.8 5.3 5.6 4.0 4.7 3.4 8.8

5.0 6.8 7.3 5.6 4.2 2.1 9.3

33.9 16.7 14.8 6.6 12.7 8.4 18.1

6.5 7.6 6.9 4.7 6.8 5.4 8.5

5.1 5.3 6.8 4.7 5.2 5.4 8.4

10.7

13.5

5.3

5.7

12.1

6.5

6.1

10.3

12.5

5.2

5.8

15.9

6.6

5.8

* CV indicates coefficient of variation; QC, quality control; SFF, Study for Future Families; and WHO, World Health Organization.

Journal of Andrology · July/August 2004

652

Table 4. Mean percent difference from the standard value for semen parameters of individual ejaculates analyzed for QC*

Technician Standard 1† Standard 2† Technician 1 Technician 2 Technician 3 Technician 4 Technician 5 Mean difference, study site technicians only

MicroCell (%)

Hemacytometer, All Testing Dates (%)

Hemacytometer, Testing Dates 3–8 (%)

Motility (%)

5.3 5.7 12.8 12.7 13.7 13.5 14.6

7.6 8.4 14.1 18.9 16.1 16.8 23.3

6.9 7.6 13.6 17.8 15 16.2 20.2

5.2 7.1 11.1 11.7 7.6 5.8 14.3

13.5

17.8

16.6

11.9

* QC indicates quality control. † For concentration determinations, the percent difference measures the difference between the standard value and the mean of the 2 standard values. For motility assessments, the percent difference for all technicians, including the standards, is relative to the ‘‘gold standard’’ value for the videotape, determined before the study began and used throughout the study.

cent difference between the chambers whether they analyzed data from live semen specimens for the study or data obtained from QC samples (technicians 2 and 3), and yet others had large differences between evaluations on semen and those from QC samples (technicians 4 and 5). Three of the 5 technicians had greater differences between the 2 chambers in their QC counts than in the study data, while 1 technician had much greater differences in the study data. Table 5 also shows summary percent differences between the 2 methods based on means from the SFF study semen evaluation data as well as all means from the SFF technicians’ QC data.

Discussion The results reported in this communication were obtained in a well-controlled multicenter study. Strictly speaking,

these data and the conclusions drawn from them can be applied only to such studies. Nevertheless, our results are also relevant to the operation of clinical andrology laboratories. Because only fertile males were examined in this study, higher values for semen parameters and lower variation might be expected in comparison with populations of infertile men. The findings of this study highlight the problems and limitations of implementing protocols for standardized semen evaluation in individual laboratories and of comparing the results obtained in different laboratories. Our results also show that these problems and limitations can be reduced substantially when standardized protocols, thorough training of technicians, and elaborate proficiency testing programs are in place. These data from the SFF demonstrate that our standards for technician certification (concentration and motility values that are 15% or less from the standard, with an intratechnician CV of 15% or less) are attainable in a

Table 5. Percent differences between MicroCell counts and hemacytometer counts, by technician, for both SFF study counts and QC counts* Difference Between MicroCell Counts and Hemacytometer Counts Technician Standard 1 Standard 2 Technician 1 Technician 2 Technician 3 Technician 4 Technician 5 Mean % difference for all samples, study site technicians only

SFF Study Data (%)

QC Data (%)

NA NA 28 13 32 11 36

38 26 40 19 29 33 20

24 n 5 509 study subjects†

29 n 5 128 replicates from 32 ejaculates

* QC indicates quality control; SFF, Study for Future Families. † The SFF study data are given in Brazil et al, 2004.

Brazil et al · Quality Control of Semen Evaluation carefully controlled multicenter study using the MicroCell chamber for both sperm counts and simple motility assessment. We set these standards a priori on the basis of more than 10 years of experience by the central laboratory in training and assessing andrology technicians for multicenter studies using these 2 techniques. Other investigators have chosen similar thresholds to differentiate between those technicians who would be considered ‘‘exact and precise’’ from those who were ‘‘inexact and imprecise’’ (Auger et al, 2000). The procedures required to train technicians to this level of proficiency, as well as those required to monitor technician performance, are complex and may not be practical for application in all clinical and research laboratories. This was our first attempt to standardize the use of the hemacytometer chamber as well as the categorical motility scoring system for a multicenter study, and we were not able to achieve the same standard for either of these methods. While the mean intratechnician coefficient of variation was 13.5% for the hemacytometer, the mean intertechnician coefficient of variation was 15.2%, and the mean percent difference exceeded 16%. However, retrospectively, the intertechnician variability (coefficient of variation) in hemacytometer counts, determined from the overall mean values of all QC samples, was only 8% (Table 1). For population data from multicenter studies, this measurement may be as important as individual specimen data. In our experience of training more than 70 technicians for multicenter studies, a technician’s experience in performing semen evaluation does not necessarily predict proficiency in implementing study protocols; rather, the biggest predictor of success is a bright, motivated, detailoriented person with an analytical approach. If given thorough training, inexperienced technicians with keen observational skills can perform semen analysis at least as accurately and precisely as technicians with years of experience. In our experience, multicenter andrology studies present many challenges for standardization, as sites often have unique clinical situations. Communication and support among investigators, study coordinators, and technicians are absolutely essential for the success of multicenter studies. Technicians in busy clinical laboratories are presented with new scheduling challenges to accommodate study specimens, and they are expected to analyze these specimens using elaborate protocols, which are often quite different from their clinical protocols. Initial planning by the clinical laboratory is made more difficult because of the variability in the rate at which technicians attain competence in performing the semen evaluation to study standards. Perhaps the greatest challenge for multicenter studies of semen quality is to standardize and monitor the methodology for determination of sperm concentration (Bjo¨rn-

653 dahl and Kvist, 1998). As in the present study, aliquots of preserved semen have been used frequently as QC materials for assessing proficiency in sperm concentration determination (Cooper et al, 1999; Gandini et al, 2000; Jørgensen et al, 2001). The shortcoming of this approach is that it does not adequately evaluate the variability associated with mixing and sampling from the whole volume of semen. The volume of preserved semen is typically small compared to that of a semen sample, resulting in very different mixing dynamics from those encountered with an ejaculate in a specimen container. Fixed, dead sperm also settle and clump differently in solution than do live sperm in semen (Keel et al, 2000). Even when groups of technicians are tested together using live semen, the volume given to each technician is likely to be very small (150 mL in the study of Auger et al, 2000), again altering the routine semen mixing procedures. Nevertheless, when aliquots of fresh semen or fixed/diluted semen are prepared properly, all technicians receive essentially the same sample, and most variability detected can likely be attributed to specific aspects of the counting procedure, as opposed to confounding variability from semen mixing and sampling. It can be argued that this is the more important variability to assess, given that after adequate training with semen evaluation protocols and procedures, variability from sampling during the semen evaluation can be minimized. Study data from the first 509 semen evaluations in the SFF suggest that the sampling error can be quite small. The mean coefficients of variation determined from the replicate MicroCell counts, hemacytometer counts, and percent motility assessments performed on individual specimens were 3.9%, 4.4%, and 4.1%, respectively (Brazil et al, 2004). For concentration assessments by both MicroCell and hemacytometer, this includes both the sampling error in making the dilutions from the semen as well as the sampling from the diluted count vials. These mean coefficients of variation determined from concentration replicates of study subjects are all lower than those seen during our QC exercises on blinded replicates (ie, 3.9% on study samples vs 10.7% for QC samples for MicroCell concentrations and 4.4% vs 13.5% for hemacytometer values). This could be because the replicates for the study were not performed blindly and the technician’s second assessment was possibly influenced by the first, or it could simply be because new sources of variability were introduced with the QC samples. Variability in sperm motility evaluation is probably best assessed by bringing groups of technicians together for evaluations of fresh semen. In studies with geographically distant sites, this approach is impractical except for initial training, and it may fail to monitor sampling errors, as discussed above. Motility evaluation also has been assessed using aliquots of pooled, cryopreserved semen. A

654 weakness of this approach is that biological variability in the motility of thawed cryopreserved sperm may result in uncontrolled variability in the QC materials (Cooper et al, 1992; Muller, 1992; Clements et al, 1995). It is also expensive to ship cryopreserved semen and difficult to get large numbers of aliquots from the same sample (Cooper et al, 1999). Another drawback of this approach is that the range of postthaw motility may be much lower than that encountered with fresh semen. As with fresh semen, the small sample volume also limits the usefulness of this material for assessing semen mixing and sampling errors. Videotaped images were used to monitor motility assessment in the present study. Two valid criticisms of videotaped images are that they do not reproduce the experience of looking through a microscope (Matson, 1995) and that they do not allow an assessment of semen mixing and sampling. The first problem was addressed in the current study by the use of the acetate grid overlay for the video monitor to more closely mimic live specimen analysis through the reticle grid of the microscope. Although videotapes do not allow monitoring of procedures for mixing and sampling of semen, they do offer the alternate advantage that all technicians see exactly the same image, and all variability can be attributed directly to technician analysis, rather than to the possibility of any biological variations or sampling errors. Repeated analysis of the videotape at a later date can also reveal drift over time, a particularly important variable in multicenter studies where study recruitment may take place at different rates during different time periods at varying sites. Videotapes can also be valuable in ensuring standardization among different studies. In the present study, 2 sets of videotapes were used for QC. Each videotape was labeled uniquely, and the technicians were unaware that the same videotape was repeated. The likelihood for bias due to familiarity was further minimized by the fact that each tape included 200 or 280 different fields of semen. The technicians recorded raw data following analysis of each field and calculated final percent motility only after the analysis of all specimens was completed. The mean intratechnician coefficients of variation obtained during QC exercises for simple motility analysis were near those from replicate simple motility assessments from the 509 study semen evaluations (4.1% CV for replicate motility counts from study data [Brazil et al, 2004] and 5.3% CV for replicates of QC data). These data suggest that videotapes, if prepared properly, are an effective tool for monitoring motility analysis. However, it should also be noted that the SFF study data showed a slightly greater, though not significant, difference between the simple percent motility values and the collapsed ‘‘a plus b plus c’’ values than was seen with the QC data. Simple motility was 6.6% lower than ‘‘a plus b plus c’’ values for the

Journal of Andrology · July/August 2004 SFF study data but, on average, was only 1.1% lower for QC data. Ideally, the simple percent motility should be identical to the collapsed categorical values of ‘‘a plus b plus c.’’ Prior to training the SFF technicians, a representative from the central laboratory visited the Department of Growth and Reproduction, Rigshospitalet, Copenhagen, Denmark, to learn the method of categorical sperm motility assessment being used in the European studies of fertile men (Jørgensen et al, 2001). At the initial SFF technician training session for the current study, it became clear that the SFF technicians who had been performing categorical motility assessment as part of their routine clinical responsibilities used criteria for classifying sperm into the ‘‘a’’ and ‘‘b’’ categories that were very different from those used in the European studies. In general, the SFF technicians tended to categorize most progressive sperm as ‘‘a’’ sperm, with only a few sperm being classified in the ‘‘b’’ categories. In contrast, the European method used the ‘‘a’’ and ‘‘b’’ categories to separate the linear and rapidly progressive sperm from the moderately or slowly progressive sperm. Because of the subjectivity of the technique, it was decided that it was probably not possible for the SFF study technicians to use one set of standards for their clinical work and another set of standards for the SFF study subjects while maintaining precision. Therefore, attempts to match the European method were abandoned, and the technique currently in use by the majority of the SFF technicians was taught to all SFF technicians. When the QC data were analyzed, there was evidence of a systematic difference between the study sites and the central laboratory in the assignment of sperm to the ‘‘a’’ and ‘‘b’’ categories, as the central laboratory technicians still tended more toward the European differentiation of ‘‘a’’ and ‘‘b’’ sperm. Therefore, percent difference values from the central standard were not used to monitor performance for this motility method. Our results suggest that a subjective method for the categorical assessment of sperm motility cannot be adequately standardized for use in multicenter studies. Individual andrology laboratories, and technicians within those laboratories, have different standards by which sperm are assigned to ‘‘a’’ and ‘‘b’’ categories. This lack of standardization is due both to the subjectivity of the method and the changing description of the standards for category assignment (WHO, 1987, 1992, 1999). There have been reports that the standardization of motility assessments among collaborating laboratories improves over time (Cooper et al, 1999). However, when clinical laboratories are involved in multicenter semen studies, the andrology technicians are likely to continue to perform clinical semen evaluations as well as those for the research study. In our experience, clinical laboratories are unwilling or unable to change the standards for clinical

Brazil et al · Quality Control of Semen Evaluation evaluations. According to the standards used clinically by most of the participating laboratories in the SFF, most rapidly and moderately progressive sperm were assigned to the ‘‘a’’ category, and it can be questioned whether it is useful to separately categorize the slowly progressive ‘‘b’’ and nonprogressive ‘‘c’’ sperm. The type of motility data required for any multicenter study should be clearly defined when the study is designed. If the study goal is to precisely measure relative differences in the progression of individual sperm in an ejaculate, then a more sophisticated and objective methodology such as computeraided sperm analysis (CASA) is required. If not, then a simple motility assessment with a progression score may be sufficient. There is no standard approach for analyzing QC data to assess the precision of semen evaluation procedures. Some investigators have calculated coefficients of variation on the basis of individual sperm concentration values; others have used mean values to calculate coefficients of variation. Some investigators reported coefficients of variation on raw data, and others reported coefficients of variation after transforming the means. These different approaches lead to different estimates of variability. Use of coefficients of variation for assessing variation is complicated by the fact that the coefficient of variation is clearly related to the mean (Cooper et al, 1992; Clements et al, 1995). In the present study, we reported the range and mean of intertechnician coefficients of variation determined from individual semen specimens as well as the intertechnician coefficient of variation determined from the mean of all QC samples scored by each technician. The data obtained from the mean of all samples suggest much greater precision than those from individual samples. However, both approaches provide valid measurements of intertechnician variability in this study. Two approaches have been used previously for assessing the accuracy of semen evaluation procedures. The first involves comparing each technician’s values to the mean of all technicians tested. One problem with this method is that it cannot detect common errors made by all members of the group (Keel et al, 2000). For example, it has been suggested that technicians generally tend to overestimate percent motility (WHO, 1999), a phenomenon we have frequently observed in our training sessions. The second approach for assessing the accuracy of semen evaluation procedures is to compare the mean value of each technician to those of a group of technicians who have shown consistency through time and who have good QC methods in place in their laboratory. This is the approach we used in the present study. Our decision to certify a technician was made following a comparison of the QC data to the mean values obtained by 2 technicians in the central laboratory. Similarly, during the course of the

655 study and during routine QC assessments, we used the same strategy to determine if corrective action was needed at a study site. Additionally, we routinely compared the individual technician’s values to the mean value of all technicians. In general, values from comparisons to ‘‘all technicians’’ were similar to those obtained with comparisons to the central standard technicians; however, for hemacytometer counts, the 2 standard technicians tended to be slightly higher than the majority of the SFF technicians. In our QC activities, in general, all sperm concentrations determined by the MicroCell chamber were lower than those obtained with the hemacytometer chamber, even though all dilution vials were theoretically identical and were randomized to the different counting chambers. These results are consistent with data obtained from ejaculates of study subjects that were counted by the 2 methods, as well as data obtained by counting standard preparations of Accubeads (Brazil et al, 2004). In all QC activities, we consistently saw less intra- and intertechnician variability with the MicroCell than with the hemacytometer, even though the hemacytometer counts were higher. Some possible explanations for the greater precision of the MicroCell chamber have been discussed elsewhere (Brazil et al, 2004). The original study goal of technician values differing less than 15% from the standard value was not met for hemacytometer counts by most technicians. Although preparation of the QC materials for hemacytometer counts was shown to affect variability in counting, it is still unclear how much of the overall variability is related to the counting chamber and how much is related to the QC process. Even though the QC results that we report compare favorably with those from other published studies, they still indicate fairly large differences between results obtained by different technicians, especially for hemacytometer counts. Whether these differences are real or how much they are influenced by the QC testing methods is unknown. We conclude that the andrology laboratory training and QC program in the SFF met our objectives of maintaining intratechnician coefficients of variation of semen measurements and percent differences from standard values of 15% or less for sperm concentration determination by MicroCell and for visual determination of percent motility. While the mean intratechnician coefficient of variation for hemacytometer was less than 15%, intertechnician coefficients of variation averaged 15.2% for the hemacytometer, and percent difference values from the standard for mean hemacytometer counts exceeded 16%. We will continue our efforts to standardize hemacytometer measurements among laboratories and to develop better QC methods for assessing their effectiveness.

656

Acknowledgments In addition to the authors, the Study for Future Families Research Group includes: from the University of Missouri, Columbia: B. S. Carter, RN; D. J. Kelly, RN; R. L. Kruse, PhD; S. L. Stewart, BA; and T. M. Simmons, BS; from the University of California, Davis: C. Treece, CLA; and C. Tollner, BS; from the Harbor-UCLA Medical Center, Torrance: R. S. Swerdloff, MD; L. Lumbreras, PA; S. Villanueva, RN, NP; M. DiazRomero, MD; A. Victoroff, PA; R. Sandoval, BA; S. Baravarian, PhD; A. Leung, HTC; and A. L. Nelson, MD; from the Cedars-Sinai Medical Center, Los Angeles: C. Hobel, MD; and B. Brock, MD; from the Mt Sinai School of Medicine, New York City: M. Hatch, PhD; M. Pfeiffer, MS; L. Quinones, BA; K. Polgar, PhD; and A. Brembridge, MPH; and from the University of Minnesota, Minneapolis: C. Kwong, MPH; A. Muehlen, BA; T. Perrier, MLT; T. Srb, BS; J. Pryor, MD; and C. De Jonge, PhD. We also acknowledge the valuable scientific guidance provided by Dr Gwen Collman, National Institute of Environmental Health Sciences, National Institutes of Health, and the contributions of the physicians, midwives, and staff of the University Physicians Clinic, Columbia, Mo; the Fairview Riverside Women’s Clinic, Minneapolis, Minn; the HarborUCLA Medical Center, Torrance, Calif; the Cedars-Sinai Medical Center, Los Angeles, Calif; the Mt Sinai Medical Center, New York, NY; and participants in the Study for Future Families.

References Auger J, Eustache F, Ducot B, et al. Intra- and inter-individual variability in human sperm concentration, motility and vitality assessment during a workshop involving ten laboratories. Hum Reprod. 2000;15:2360– 2368. Bjo¨rndahl L, Kvist U. Basic semen analysis courses: experience in Scandinavia. In: Ombelet W, Bosmans E, Vandeput H, et al, eds. Modern ART in the 2000s. Andrology in the Nineties. New York, NY: Parthenon Publishing Group; 1998:91–101. Brazil C, Swan S, Drobnis E, Liu F, Wang C, Redmon JB, Overstreet J. Standardized methods for semen evaluation in a multicenter research study. J Androl. 2004;25:635–644. Clements S, Cooke ID, Barratt CL. Implementing comprehensive quality control in the Andrology laboratory. Hum Reprod. 1995;10:2096– 2106. Cooper TG, Atkinson AD, Nieschlag E. Experience with external quality control in spermatology. Hum Reprod. 1999;14:765–769. Cooper TG, Neuwinger J, Bahrs S, Nieschlag E. Internal quality control of semen analysis. Fertil Steril. 1992;58:172–178. Dunphy BC, Kay R, Barratt CLR, Cooke ID. Quality control during the conventional analysis of semen, an essential exercise. J Androl. 1989; 10:378–385. Gandini L, Menditto A, Chiodo F, Lenzi A. Italian pilot study for an external quality control scheme in semen analysis and antisperm antibodies detection. Int J Androl. 2000;23:1–3. Guzick DS, Carson SA, Coutifaris C, et al. Efficacy of superovulation

Journal of Andrology · July/August 2004 and intrauterine insemination in the treatment of infertility. N Engl J Med. 1999;340:177–183. Guzick DS, Overstreet JW, Factor-Litvak P, et al. Sperm morphology, motility, and concentration in fertile and infertile men. N Engl J Med. 2001;345:1388–1393. Jequier AM, Ukombe EB. Errors inherent in the performance of a routine semen analysis. Br J Urol. 1983;55:434–436. Jørgensen N, Anderson AG, Eustache F, et al. Regional differences in semen quality in Europe. Hum Reprod. 2001;16:1012–1019. Jørgensen N, Auger J, Giwercman A, et al. Semen analysis performed by different laboratory teams: an intervariation study. Int J Androl. 1997;20:201–207. Keel B, Quinn P, Schmidt C, Serafy N Jr, Serafy N Sr, Schalue T. Results of the American Association of Bioanalysts national proficiency testing programme in andrology. Hum Reprod. 2000;15:680–686. Knuth UA, Neuwinger J, Nieschlag E. Bias to routine semen analysis by uncontrolled changes in laboratory environment—detection by longterm sampling of monthly means for quality control. Int J Androl. 1989;12:375–383. Matson PL. Quality control assessment for semen analysis and sperm antibody detection: results of a pilot scheme. Hum Reprod. 1995;10: 620–625. Mortimer D, ed. Technician training and quality control aspects. In: Practical Laboratory Andrology. Oxford, United Kingdom: Oxford University Press; 1994:337–347. Mortimer D, Shu MA, Tan R. Standardization and quality control of sperm concentration and sperm motility counts in semen analysis. Hum Reprod. 1986;1:299–303. Muller CH. The andrology laboratory in an assisted reproductive technologies program. J Androl. 1992;13:349–360. Neuwinger J, Behre HM, Nieschlag E. External quality control in the andrology laboratory: an experimental multicenter trial. Fertil Steril. 1990;54:308–314. Overstreet JW, Brazil CK. Semen analysis. In: Lipshultz L, Howards S, eds. Infertility in the Male. St Louis, Mo: Mosby; 1997:487–490. Overstreet JW, Fuh VL, Gould J, et al. Chronic treatment with finasteride daily does not affect spermatogenesis or semen production in young men. J Urol. 1999;162:1295–1300. Swan SH, Brazil CK, Drobnis E, et al. Geographic differences in semen quality of fertile US males. Environ Health Perspect. 2003;111:414– 420. Walker RH. Pilot surveys for proficiency testing of semen analysis. Arch Pathol Lab Med. 1992;116:423–424. World Health Organization. WHO Laboratory Manual for the Examination of Human Semen and Semen–Cervical Mucus Interaction. 2nd ed. Cambridge, United Kingdom: Cambridge University Press; 1987. World Health Organization. WHO Laboratory Manual for the Examination of Human Semen and Semen–Cervical Mucus Interaction. 3rd ed. Cambridge, United Kingdom: Cambridge University Press; 1992. World Health Organization. WHO Laboratory Manual for the Examination of Human Semen and Semen–Cervical Mucus Interaction. 4th ed. Cambridge, United Kingdom: Cambridge University Press; 1999.

Suggest Documents