A GENERALIZED ESTIMATING EQUATIONS APPROACH ... - CUFER

15 downloads 43771 Views 475KB Size Report
a modelling strategy of selecting the best cluster definition first, and the working ... objective, many fisheries are managed by so-called 'input controls' such as ...
Aust. N. Z. J. Statist. 42(2), 2000, 159–177

A GENERALIZED ESTIMATING EQUATIONS APPROACH FOR ANALYSIS OF THE IMPACT OF NEW TECHNOLOGY ON A TRAWL FISHERY JANET BISHOP∗1 , DAVID DIE1 AND YOU-GAN WANG2,3 CSIRO Marine Research and CSIRO Mathematical & Information Sciences Summary The article describes a generalized estimating equations approach that was used to investigate the impact of technology on vessel performance in a trawl fishery during 1988–96, while accounting for spatial and temporal correlations in the catch–effort data. Robust estimation of parameters in the presence of several levels of clustering depended more on the choice of cluster definition than on the choice of correlation structure within the cluster. Models with smaller cluster sizes produced stable results, while models with larger cluster sizes, that may have had complex within-cluster correlation structures and that had withincluster covariates, produced estimates sensitive to the correlation structure. The preferred model arising from this dataset assumed that catches from a vessel were correlated in the same years and the same areas, but independent in different years and areas. The model that assumed catches from a vessel were correlated in all years and areas, equivalent to a random effects term for vessel, produced spurious results. This was an unexpected finding that highlighted the need to adopt a systematic strategy for modelling. The article proposes a modelling strategy of selecting the best cluster definition first, and the working correlation structure (within clusters) second. The article discusses the selection and interpretation of the model in the light of background knowledge of the data and utility of the model, and the potential for this modelling approach to apply in similar statistical situations. Key words: covariance; fishing power; generalized estimating equations; overdispersion; Poisson; spatial and temporal correlations.

1. Introduction In this paper, we show some practical complications in using the generalized estimating equations (GEE) approach to analyse data with spatial and temporal correlations, and suggest a strategy for resolving some of these problems, illustrating it by analysis of data from a fisheries application. The aim of the analysis is to estimate the extent to which new technology has increased the catching power of a fishing fleet. This topic is important for fishery managers, who generally share a common objective: to ensure the long-term viability of the fishery. To achieve this objective, many fisheries are managed by so-called ‘input controls’ such as restrictions on Received November 1998; revised August 1999; accepted November 1999. ∗ Author to whom correspondence should be addressed. 1 CSIRO Division of Marine Research, PO Box 120, Cleveland, QLD 4163, Australia. e-mail: [email protected] 2 CSIRO Mathematical & Information Sciences, PO Box 120, Cleveland, QLD 4163, Australia. 3 Dept Biostatistics, Harvard School of Public Health, 655 Huntington Avenue, Boston, MA 02115, USA. Acknowledgments. The authors thank the fishers of the Northern Prawn Fishery who provided their catch records, and Carolyn M. Robins (Bureau of Rural Sciences, Canberra, Australia) and Margot Sachse (Australian Fisheries Management Authority, Canberra) for establishing a validated dataset. The authors are grateful for comments on earlier drafts from Richard Morton (CSIRO CMIS, Canberra) and Andre Punt (CSIRO Marine Research, Hobart) and three anonymous reviewers, that influenced the direction of the final version. This project was supported by the Australian Fish Management Authority Research Fund. c Australian Statistical Publishing Association Inc. 2000. Published by Blackwell Publishers Ltd,  108 Cowley Road, Oxford OX4 1JF, UK and 350 Main Street, Malden MA 02148, USA

160

JANET BISHOP, DAVID DIE AND YOU-GAN WANG

licences, fishing seasons, gear or vessels. The need for, and effectiveness of, such restrictions are shown by several analyses among which is the topic of the current paper: fishing power — also known as catching power or the effectiveness of the time spent fishing. Accurate estimates of fishing power, and the monitoring of changes in the fishing power of the fleet over time are very important for supporting good decision-making in input-controlled fisheries. To estimate fishing power, catch is modelled as a function of days spent fishing (fishing effort), abundance of the stock, and vessel characteristics including the presence and configuration of new technology on board (Hilborn & Walters, 1992 pp . 126–132; Robins, Wang & Die, 1998). In this paper we analyse factors that contribute to fishing power in the tiger prawn component of Australia’s Northern Prawn Fishery. This fishery extends across northern Australia for a distance of nearly 1000 nautical miles (Robins et al., 1998). Commercial logbook records of daily catch weights and location of catch describe 98% of the catch of tiger prawns landed by the 243 trawlers in the fishery since the late 1980s. These data are typical of fisheries. They are repeated observations of catch rates from the same vessels fishing in various statistical areas. The data are catch weights of prawns, and we assume they arise from a Poisson process with overdispersion. They are observational not experimental data. Biological characteristics of the stocks lead to seasonal abundance fluctuations that affect catch rates, and that vary spatially at a spatial scale finer than those of any statistical areas. Any of these characteristics of the data may introduce spatial or temporal correlations into the error structure unless accounted for in analyses. Statistical analyses that ignore correlated errors can increase the risk of invalid scientific inferences (Cressie, 1993 Chapter 1.3; Diggle, Liang & Zeger, 1994 Chapter 1). Therefore, we reasoned that intercorrelations in the data should be accounted for in the analysis of fishing power. Statistical methods that account for such correlations in analyses of binary, count and ordinal data have been developed over the last decade (Liang & Zeger, 1986; Zeger & Liang, 1986, 1992; Diggle et al., 1994). We use GEE methods (Liang & Zeger, 1986; Zeger & Liang, 1986) to investigate the impact of new technology in the fishery, and to take account of any spatial and temporal correlations in catch rates. The GEE methods offer several advantages; however, to gain these advantages certain conditions apply. Recently, Balemi & Lee (1999) obtained finite-sample expansions of bias and efficiency of estimates from the GEE approach with mis-specified working correlation matrices. Their main findings are (i) bias and efficiency depend on the combination of a number of characteristics of the data: cluster size, intra-cluster correlation of covariates, intra-cluster correlation of response variable, variability of cluster size, and the relative response association, and (ii) the performance of GEE is excellent for moderate degrees of response correlation and small clusters. The literature on GEE does not yet include comprehensive rules or recommendations for modelling strategies, such as are available for generalized linear modelling (GLM), although we acknowledge this situation is changing rapidly as the literature on theory and applications of GEE expands. Considerable time and effort can be spent on modelling correlation structures with the GEE approach, and beforehand there is no way to know with any certainty how much advantage can be obtained from a GEE model for a particular dataset. We are interested in a practical strategy for modelling to limit the amount of effort required in developing the model while aiming to use the most appropriate one for the analysis. Section 2 gives some background on GEE methods, followed by a description of the problem and the data. Section 3 describes the statistical model and the modelling strategy. c Australian Statistical Publishing Association Inc. 2000 

ANALYSIS OF THE IMPACT OF NEW TECHNOLOGY ON A TRAWL FISHERY

161

Section 4 reports the results. Finally, Section 5 discusses the model selection and interpretation in the light of background knowledge of the data and utility of the model, and the appropriateness of our modelling approach to tackle similar statistical applications. 2. Background 2.1. GEE Let Yi = (Yi1 , . . . , Yini )T represent the vector of ni measurements on the ith cluster and Xi = (xi1 , . . . , xip )T be the p vectors of independent variables (the p –explanatory variables) on the ith cluster. The GEE method of Liang & Zeger (1986) allows regression modelling of data that are not multivariate normal, when the data can be modelled as a GLM except for the correlation among responses. Only the marginal distributions are modelled parametrically. The mean vector of Yi is assumed to be µi = h−1 (Xi β) where h is a link function. The estimating equation for β is n   ∂µi T −1 Vi (Yi − µi ) = 0, ∂β i=1

where Vi is a ‘working’ covariance matrix for Yi , given by 1/2

1/2

Vi = φAi Ri Ai , where φ is a scale parameter following the quasi-likelihood approach (where the variance of Yij is expressed as a known function of the expectation µij , that is, var(Yij ) = φg(µij ) , where φ is treated as a nuisance parameter and the focus of quasi-likelihood is on methods for inference about β ); Ai is an ni × ni diagonal matrix with g(µij ) as the j th diagonal element; Ri is a ni × ni ‘working’ correlation matrix for Yi . Here, Ri is referred to as a working correlation matrix because it does not have to be correctly specified. An approximate correlation structure is assumed. We often specify Ri in terms of a vector of parameters which is denoted by ρ, in which case we write Ri (ρ) to emphasize this dependence. Liang & Zeger (1986 pp . 17–18) describe several possible choices for the working correlation matrix Ri , ranging from the simple assumption that repeated observations are uncorrelated, through to the most complex possibility that the n(n − 1)/2 correlations vary. The selection of working correlation structures has been discussed or illustrated in the statistical literature (Lipsitz, Kim & Zhao, 1994; Pepe & Anderson, 1994; Albert & McShane, 1995; Lumley, 1996; Chen & Ahn, 1997; O’Hara Hines, 1997; Sutradhar & Das, 1999). In the GEE approach the specified model is fitted by computing an initial estimate of β, for example with an ordinary generalized linear model assuming independence, estimating the dispersion parameter φ from the standardized residuals, and computing the working correlations R based on the standardized residuals and the assumed structure of R. Then an estimate of the covariance is computed, and the estimate of β is updated. These steps are iterated until convergence. Liang & Zeger (1986) showed that the estimator for βˆ is consistent and asymptotically normal and its variance can be consistently estimated by a sandwich type estimator. The estimator M = I0−1 I1 I0−1 is called the empirical or robust estimator of the covariance matrix ˆ where of β, n   ∂µi ∂µi T −1 I1 = . Vi cov(Yi )Vi−1 ∂β ∂β i=1

c Australian Statistical Publishing Association Inc. 2000 

162

JANET BISHOP, DAVID DIE AND YOU-GAN WANG

ˆ even if the It has the property of being a consistent estimator of the covariance matrix of β, working correlation matrix is mis-specified, that is, if cov(Yi ) = Vi , whereas the model-based ˆ is consistent only if both the mean model and the correlation matrix are estimator of cov(β) ˆ is covM (β) ˆ = I −1 where correctly specified. The model-based estimator of cov(β) 0 I0 =

n   ∂µi T i=1

∂β

Vi−1

∂µi . ∂β

A well-known and important property (robust property) of the GEE approach is that the estimates of mean parameters remain consistent even if the correlation or the covariance structure is mis-specified. Intuitively, careful modelling of the correlation structure would improve the statistical inference (Albert & McShane, 1995). Even higher moments can be incorporated into estimation using the generalized version of GEE, GEE2 (Liang, Zeger & Qaqish, 1992). However, bias can arise and efficiency can be lost if higher moment assumptions are incorrect. So to extract full information from the data, as robustly and efficiently as possible, ‘safe’ modelling is important. In fact, in many practical cases, ‘cluster’ is not naturally defined and very different estimates may be produced when different sensible ‘working’ matrices are used. It is not an easy task to pick the ‘right’ one. In our case, year, month, vessel and spatial area variations have to be taken into account. The cluster level is not clear. If we allow all the possible combinations of factors to be different clusters, we have the ‘na¨ıve’ independence model — the simplest model. On the other hand, if we assume that all the observations are correlated, we end up with only one cluster — the most complicated model (Lumley & Heagerty, 1999). Another idea is simply to use the independence model, ignoring data dependence completely (Pepe & Anderson, 1994; Sutradhar & Das, 1999). Developments along these lines are useful for certain types of data. However, it is not well known how much loss in estimation efficiency occurs in complicated cluster settings. One should not give up modelling correlation structure solely to secure regression estimation robustness. In fact, ignoring the correlations between observations is in many cases simply an unacceptable sacrifice of information (Fitzmaurice, 1995). One unsatisfactory aspect of the GEE approach is a lack of goodness of fit statistics for variable selection or correlation structure selection. For example, the goodness of fit statistics produced by the PROC GENMOD procedure recently developed in SAS (SAS Institute, Inc., 1997) are all derived from the independent model and hence are invalid for clustered data. Therefore, no guidelines are available for comparing various working correlation structures. 2.2. The impact of new technology factors on fishing power The main aim of this analysis is to achieve accurate parameter estimates for the impact of the new technology factors on fishing power, especially for the technology factors that are potential targets for management restrictions. Modelling of the variance components or correlation structure is of little direct interest in itself. Nevertheless, to interpret the results, it is necessary to understand where the inter-correlations in this dataset might come from. The possible sources of inter-correlations in the data are described here, along with the model of fishing power, considering in turn each of the main factors that predict fishing power. The main categories of factors that affect catch per day are abundance of the stock, and fishing c Australian Statistical Publishing Association Inc. 2000 

ANALYSIS OF THE IMPACT OF NEW TECHNOLOGY ON A TRAWL FISHERY

163

power. Fishing power is determined by swept area (the area or volume covered by the trawl gear, per time unit), and the quality of the vessel and crew as a catching unit. 2.3. Abundance We are unable to describe the population dynamics (recruitment and mortality) of prawn stocks formally at the spatial and temporal scales of a single fishing day for a single vessel. However, we know that population biomass is likely to change with time (between years and months) and area. The generation time of tiger prawns is 6–12 months. There is a strong seasonal pattern in abundance, and catch rates in adjacent months are correlated, while abundance in adjacent years is only moderately correlated (see Figures 1 and 2; and Wang & Die, 1996). Two species of prawns constitute the tiger prawn component of the fishery. The two species differ in habitat preferences and life cycles. There are probably two or more distinct stocks of each species within the large area of the fishery. Some localities support a single species, others a mixture. The main predictor of species distribution is whether the habitat sediment type is predominantly mud or sand (Somers, 1994). Localities with historically high catches occur within each statistical area, but abundance at these localities does not always correlate with the overall abundance pattern for the month, year and area. Although we expect spatial variation in stocks, we have no reason to suspect the existence of a consistent geographical trend in catch rates. In theory, abundances of separate stocks do not have to be correlated, but there is some evidence that large-scale environmental variability affects wide areas of the fishery. This can generate correlation in abundance between statistical areas. Localities supporting different species may have negatively correlated abundance. In summary, there may be spatial and temporal correlations arising from the biology and distribution of the two species. Thus, the observations must be classified by area and time factors to account for changes in the characteristics of populations and thus to help explain variations in the catch. For this reason, area, year and month terms with all their interactions are in the main model (Robins et al., 1998). However, variations in abundance that occur at a scale of resolution other than those of the area, year and month classifiers, lead to correlated errors. 2.4. Swept area performance The main factors determining swept area are the width spanned by the nets under operational conditions, the trawl speed, and the vessel’s ability to maintain trawl speed. The trawl speed is determined by such characteristics as the size of the vessel, engine power and propeller configuration, and drag from towing the gear. To represent these factors that determine swept area performance in this analysis we included vessel length, hull type (wooden, old steel and new steel), A-units (a measure of the vessel’s ability to maintain trawl speed, defined as engine power plus tonnage), the width of the nets (headrope length) and the presence of a kort nozzle (a cylindrical device around the propeller that has the hydrodynamic effect of increasing the thrust of the propeller, allowing larger nets to be towed for the same horsepower, or, perhaps more importantly, allowing faster towing speeds for the same net). 2.5. Catching unit — vessel and crew The vessel is traditionally the observational unit with which catch is associated. It is often assumed that vessel characteristics equate to the characteristics of the catching unit, when in fact the crew are also part of the catching unit. Skipper experience contributes an important c Australian Statistical Publishing Association Inc. 2000 

164

JANET BISHOP, DAVID DIE AND YOU-GAN WANG AREA = Weipa (1)

800 700 600 500 400 300 200 100 0 1988

1989

1990

1991

1992

1993

1994

1995

1996

1994

1995

1996

1993

1994

1995

1996

1993

1994

1995

1996

AREA = Mornington (3) 800 700 600 500 400 300 200 100 0 1988

1989

1990

1991

1992

1993

AREA = Vanderlins (4) 800 700 600 500 400 300 200 100 0 1988

1989

1990

1991

1992

AREA = Melville (8) 800 700 600 500 400 300 200 100 0 1988

1989

1990

1991

1992

Figure 1. Catch of tiger prawns (kg per day per vessel) for four months per year (August–November) over nine years (1988–1996) for four areas in the Northern Prawn Fishery

dimension to the catching power of each vessel, and skippers do not always skipper the same boat from year to year. Technology that helps the skippers target their nets to the areas of highest catch is also important. In this analysis we have investigated the type of trygear (a small trawl gear that is c Australian Statistical Publishing Association Inc. 2000 

ANALYSIS OF THE IMPACT OF NEW TECHNOLOGY ON A TRAWL FISHERY 5 0 0

165

A R E A = G ro o te (5 )

4 0 0

3 0 0

2 0 0

1 0 0

0

1 9 8 8

1 9 8 9

1 9 9 0

1 9 9 1

1 9 9 2

1 9 9 3

1 9 9 4

1 9 9 5

1 9 9 6

Figure 2. Catch of tiger prawns (kg per day per vessel) over four months per year (August–November) for nine years, for area 5 in the Northern Prawn Fishery (Groote). Observations for randomly selected boats are joined by lines within each year.

used as a sampling device to monitor catches), the towing position for trygear, the presence of Global Positioning System (GPS) and plotter systems, and the experience of the skippers. Information from catch in the trygear is used as the basis of decision-making about how to locate the highest concentrations of prawns, and accurate position information from GPS and plotter systems aids in retracing successful routes and avoiding undesirable areas. Catch rates from the same vessel might differ between different areas due to specializations of the boat and its equipment, or specializations of the skipper, for fishing in different areas. Catch rates might vary between different months due to change of decision-making strategies during the fishing season in response to combinations of declining catch rates and external factors such as fuel or market prices, bad weather, mechanical problems on board, or health problems among the crew. About half the vessels in the fleet are operated by one of several major companies. The remainder of the vessels are single owner-operated or small-company boats. While vessels in the same company share information and strategies, many skippers (including both company and independent operators) work closely with one, two or three others, sharing information to an even greater extent, for example by phoning each other several times each fishing night to discuss catch and strategies (Bishop & Sterling, 1999). Major changes in vessel and crew characteristics occur between years rather than between months within years, because vessel refits and changes in skipper occur at the end of the fishing season coinciding with the end of calendar years. 2.6. Data sources The analysis used commercial fishery logbook data collected from trawler skippers. The logbooks record each trawler’s daily catch, and the position of the greatest catch of the day. The total catch, and the corresponding fishing effort, were determined for each vessel in the fishery in each of eight possible statistical areas each month, in each year (depicted in Figures 1 and 2). We excluded any observations based on fewer than four fishing days in any month, to remove less reliable data. Records from only August to November, the four main months c Australian Statistical Publishing Association Inc. 2000 

166

JANET BISHOP, DAVID DIE AND YOU-GAN WANG

of the fishing season, were included. The final dataset used for the analysis accounted for between 63% and 71% of the total catch and fishing effort per year between 1988 and 1996 in the fishery. Information about vessel characteristics and technology on board was collated from vessel licence registers and interviews with skippers, trawler owners, fleet managers and sometimes ships’ chandlers. Two indices of skipper experience were obtained by linking lists of skippers with lists of equipment on board their trawlers: first, the number of seasons (there are two seasons in each year) of experience as skipper in the fishery since 1988; and second, the number of years’ experience with a GPS and plotter system on board. Each record contained catch and effort for vessel i in area j during month t of year k, along with the vessel’s characteristics, technology on board, and skipper experience, all as at the start of the fishing season in year k. The dataset had some further characteristics. • The main covariates of interest — the technology factors — could change within vessels

over time. However, these factors changed only between years, never between months. Only one vessel characteristic, vessel length, never changed for a vessel. • The time series had unevenly spaced time steps (catches for each month from August to November for nine years) because the fishing season is restricted in time each year. • Vessels could fish in more than one area each month, but vessels usually fished in only a subset of all possible areas in any one year, and areas fished by a vessel varied from year to year. This produced an unbalanced design. • The population of vessels was dynamic, with vessels entering and leaving the fleet each year, for reasons that might or might not be associated with their success at fishing (Dann & Pascoe, 1994 pp . 10–14). In summary, the aim of the analysis was to estimate the contribution of new technology to increasing the catching power of the fleet, by either increasing swept area or improving the catching success of the vessel-plus-crew combination. While we believe that spatial and temporal correlations occur in the data, we are not sure how to describe the cluster and correlation structure. Therefore the robustness of the GEE approach is very attractive as a form of insurance against some of the limitations of our observational data. 3. Statistical model The model has a log-link function and terms for each of the technology factors and vessel characteristics being investigated, plus skipper experience, fishing effort, and the three-way interaction year-month-area with all its main effects and lower order interactions. A level for ‘unknown’ was included in each of the categorized vessel technology factors to allow vessels with unknown status on one or more of these items to be retained in the analysis. The expected catch of vessel i fishing in area j, year k and month t is such that log(µij kt ) = α + α1 log V1i + α2 log V2ik + α3 log V3ik + α4 V4i + β1 X1ik + β2 X2ik + β3 X3ik + β4 X4ik + β5 W1ik + β6 W2ik + δ log Eij kt + h(A.Y.Mj kt ), where µij kt is the expected catch (weight in kg) of boat i fishing in area j, year k and month t ; and α is the intercept. c Australian Statistical Publishing Association Inc. 2000 

ANALYSIS OF THE IMPACT OF NEW TECHNOLOGY ON A TRAWL FISHERY

167

The vessel characteristics are V1 – V4 : V1i the vessel length (metres); V2ik the total headrope length of gear being used (fathoms); V3ik the vessel’s A-units (continuous); V4i the hull material category (three dummy indicators to represent four categories — timber, old steel, new steel or unknown); α1 – α3 are the fixed effects of the respective vessel characteristics V1 − V3 ; α4 is a vector of effects for V4 with three categories. The technology factors are X1 – X4 : X1ik is the kort nozzle status (two dummy indicators to represent three categories — present, absent, unknown); X2ik is the type of trygear used (two dummy indicators to represent three categories — otter, beam or unknown); X3ik is the position of towing the trygear (two dummy indicators to represent three categories — stern, other or unknown); X4ik is the presence or absence of a GPS. The skipper experience indicators are W1 – W2 : W1ik is the number of years a skipper has worked with a plotter on board (range from 0–6); W2ik is the number of seasons as a skipper in the fishery since 1988 (three dummy indicators to represent four categories: < 5 seasons, 5–8, 9–12 and > 12 seasons). For the respective technology characteristics ( X1 – X4 ), the fixed effects are β1 – β4 . Here β indicates vectors of parameters for multiple categories of each X ; β5 is the effect of skipper experience with a plotter (in years); β6 is the effect of skipper experience in the fishery (for three season categories); Eij kt is the term for days spent fishing (fishing effort); δ is the fixed effect of fishing effort; h(A.Y.Mj kt ) represents spatial and temporal changes in abundance of prawns arising from fluctuations in population reproduction and mortality rates, by the three-way interaction of area, year and month plus lower order terms and main effects (fixed effects for areas (Aj ), years (Yk ), months (Mt ) and all their interactions A.Y, A.M, M.Y and A.Y.M ). 3.1. Specifying the variance or cluster structure We investigated ways of accounting for possible spatial and temporal associations in the data. Fitting a GEE model requires the definition of clusters of observations. Observations from within a cluster are assumed to be correlated, while observations from different clusters are assumed to be independent. Given the complexity of the data characteristics here, it was difficult to specify the cluster and correlation structure with confidence. We investigated several possible cluster structures, and within each, several possible correlation structures. We considered there were several possible ways of defining a cluster, depending on the assumptions we made about the data. For example, when vessels were clusters, catches from the same vessel were assumed to be correlated at all times and locations, and catches from different vessels were assumed to be independent. When vessel-year-area were clusters, catches from the same vessel were assumed to be correlated between months within any year and area, but independent otherwise. The possible cluster definitions (Table 1) fall into a natural hierarchy of complexity, in terms of the dimension of the associated GEE working correlation matrix (the cluster size). The model with vessels as clusters is the most complex, with a GEE working correlation matrix dimension of 268 (8 areas × 9 years × 4 months, but not all combinations occur). This cluster definition produced 243 clusters (vessels), and the number of observations per cluster ranged from 1 to 83. At the other extreme, the model with vessel-year-area as clusters is the simplest, with a GEE working correlation matrix dimension of four, because there are observations for only four months. This cluster definition produced 4890 clusters, and the number of observations per cluster ranged from one to four. c Australian Statistical Publishing Association Inc. 2000 

168

JANET BISHOP, DAVID DIE AND YOU-GAN WANG

TABLE 1 Possible spatial and/or temporal clusters of correlated prawn catch observations, ordered by decreasing complexity (indicated by the GEE working correlation matrix dimension) Cluster definition

vessel vessel-area vessel-year vessel-year-month vessel-year-area no clusters (independence)

Cluster size (GEE working correlation matrix dimension)

No. of clusters

No. of observations per cluster: min–max

268 36 32 8 4 1

243 1378 1410 5310 4890 9905

1–83 1–32 1–13 1– 4 1– 4 1

The form of correlation structures (and definition of corresponding correlation parameter, ρ ) that we report on are: • independent ( ρ = 0 ); • exchangeable or equi-correlated, corr (yc , yd ) = ρ if c = d ; • auto-regressive (1), corr (yc , yd ) = ρ |c−d| if c = d ;

• m-dependent or stationary, corr (yc , yd ) = ρ|c−d| if |c − d| ≤ m , and 0 otherwise; • unstructured (unspecified, so estimation of all correlations is required).

They are similar to those specified by Liang & Zeger (1986 pp . 17–18). The subscripts c and d refer to the cluster unit specified by the particular cluster-model being investigated. The autoregressive and m-dependent structures were investigated as applied to months (t = 1, . . . , 4 ), years (k = 1, . . . , 9 ) and areas (j = 1, . . . , 8 , where areas were ordered geographically along the coastline of the fishery from east to west). Other correlation structures are possible, and we investigated several more complex structures; for example, to take into account two levels of clustering, we specified varying correlations between months within and between years and areas. However, we restrict this report to investigations of correlations between months within year, independent between years, (1) because of prior knowledge of abundance described in Section 2; (2) because of results from the analysis of residuals from the independence GEE model (see Section 3.2, Strategy 5); and (3) because no gain in efficiency was found from the more detailed modelling of correlations. The correlation structures can be ordered in a hierarchy of complexity, in terms of the number of correlation parameters that need to be estimated. Exchangeable and auto-regressive (the simplest structures) each require the estimation of one parameter; m-dependent requires the estimation of m parameters where m ≤ (K −1) and K is the dimension of the GEE working correlation matrix. Unstructured, the most complex, requires the estimation of 21 K(K −1) parameters. The combination of correlation structures and cluster definitions provides a convenient framework within which correlation models can be described, where the independent / vesselyear-area and exchangeable / vessel-year-area models are the two simplest models, and the unstructured / vessel model is the most complex. 3.2. Modelling strategy With the GEE approach, simplistic correlation models (that is, when cluster sizes are too small and/or there are too few correlation parameters) underestimate standard errors. Overly c Australian Statistical Publishing Association Inc. 2000 

ANALYSIS OF THE IMPACT OF NEW TECHNOLOGY ON A TRAWL FISHERY

169

complex correlation models overestimate standard errors. Under certain circumstances, misspecified correlation models lead to biased estimates. Our strategy was to look for a parsimonious model aided by the guidelines below. ‘When regression coefficients are the scientific focus . . . one should invest the lion’s share of time in modelling the mean structure, while using a reasonable approximation to the covariance’ (Diggle et al., 1994 p . 145). The strategy in 1–8 below assumes that the aim is to find efficient and consistent estimators of the mean, that the correlation is a nuisance, that the ‘lion’s share’ of prior modelling work has reached a stage of confidence at which the mean model is reasonably correctly specified, and that the choice of link function is correct (Zeger & Liang, 1986 p . 129). A further requirement for achieving consistent GEE results is that any missing data are completely at random or few in number and do not follow a pattern of absence that depends on previous outcomes (Zeger & Liang, 1986). Unbalanced designs can be accommodated. 1. Define a hierarchy of possible clusters, based on expert knowledge of the data. 2. Fit a GEE model with one simple correlation structure for each cluster model (for example, the exchangeable correlation structure). If the results indicate little correlation, adopt the independence model. Little gain in efficiency is found in GEE models over independence models when ρ is less than about 0.3 (Liang & Zeger, 1986 Tables 1 and 2). If there is a sizeable correlation within the clusters, as estimated by ρ, ˆ then proceed to find the best model to account for the correlations. 3. Fit GEE models with several candidate correlation structures within the chosen cluster definition. Appropriate choices of working correlation structures for various types of data have been discussed or illustrated by Liang & Zeger (1986 pp . 17–18), Lipsitz et al. (1994), Pepe & Anderson (1994), Albert & McShane (1995), Lumley (1996), Chen & Ahn (1997), and O’Hara Hines (1997). Although the GEE methods are robust to mis-specification so choice should not matter, the robustness breaks down under some conditions, whereupon more accurately modelled correlation structures are required to achieve good results (see steps 4 and 6). Therefore, select candidate correlation structures based on knowledge of the process from which the data were generated. 4. Choose the best cluster definition by choosing the most complex cluster definition possible that provides reasonably consistent parameter estimates across correlation structures within a cluster. The justification for this step is as follows: the estimates from the GEE approach are asymptotically consistent even if the covariance structure is mis-specified (Diggle et al., 1994 p . 145), with these exceptions: (a) coefficients for within-cluster covariates can differ markedly when the working correlation structure is grossly mis-specified (Zeger & Liang, 1986 Table 2) if predictor variables for one observation are correlated with the residuals for the other observation (Pepe & Anderson, 1994 p . 949; Lipsitz et al., 1994 p .1161; Fitzmaurice, 1995); (b) stability can be influenced by the pattern of missing values (even when missing values satisfy general assumptions). According to Lipsitz et al. (1994 p . 1162): c Australian Statistical Publishing Association Inc. 2000 

170

JANET BISHOP, DAVID DIE AND YOU-GAN WANG

With missing data, each individual does not contribute equally weighted components to the effects of [within-cluster] covariates (even if the [within-cluster] covariate would have the same pattern across time for all individuals if there was no missing data), and we would expect more complicated correlation structures to be more efficient than the independence structure for estimating these parameters as well.

5. Analyse the residuals from the independent GEE model to investigate the relative contribution of the different clusters to the variance of the residuals, and to support the choice of cluster definition. We used analysis of variance components to investigate the residuals. Other authors have used different approaches, such as semivariogram models (Albert & McShane, 1995 p. 629 Section 4 and pp . 634–635), and time series methods (Lumley & Heagerty, 1999 pp . 471–472 and Figure 2). 6. Choose the best correlation structure within the chosen cluster definition: (a) aim for as few correlation parameters as possible (Lipsitz et al., 1994; Lumley, 1996; O’Hara Hines, 1997). (b) look for the structure producing the ratio of model-based to robust standard error estimates closest to 1.0 (e.g. Lipsitz et al., 1994 pp . 1159, 1161). The justification for this step is as follows. There are some circumstances that are exceptions to the rule (of step 6a) that simple is best, namely, when the correlation is large (Liang & Zeger, 1986 Tables 1 and 2, p . 19) or when there are within-cluster covariates or missing values that do not have the same pattern for all clusters (as described in step 4). In all these circumstances the estimates are likely to be inefficient unless the correlation structure is reasonably accurately specified, and improvement in efficiency can be made by specifying a correlation matrix closer to the true situation in the data. When the model-based and robust standard error estimates are similar, this is an indication that the specified working correlation model is more consistent with the observed association. Zeger & Liang used the ratio of t -statistics (rather than standard errors) for model-based to robust results to guide the choice of model, where the preferred model had the ratio of t -statistics closest to one (Zeger & Liang, 1986 Table 3 and p .128). 7. Check the model fit by inspecting residuals and covariance matrix. 8. For inference use the robust standard error estimates. Their use takes little effort whereas the careful modelling of covariance structure needed before model-based standard error estimates can be used may take a great deal of effort for relatively small gains. The modelbased estimate of the variance is consistent if both the mean model and the working correlation matrix are correctly specified; the robust estimate is consistent provided the mean model is correctly specified even if the working correlation is not (SAS Institute, Inc., 1997 p . 296). We also agree with the caution given by O’Hara Hines (1997 p . 1554) against overinterpreting the correlation coefficients or final model of cluster and correlations: Attempts to interpret the estimated correlations ( ρ ) meaningfully in terms of what they say about the data can be futile, since quite different models for [the working correlation structure] were found to fit the data equally well. c Australian Statistical Publishing Association Inc. 2000 

ANALYSIS OF THE IMPACT OF NEW TECHNOLOGY ON A TRAWL FISHERY

171

3.3. Model fitting The GEE models were fitted by PROC GENMOD in SAS Version 6.12 (SAS Institute Inc., 1997 pp . 247–348; SAS code is available from the authors). The PROC GENMOD procedure implements the methods proposed by Liang & Zeger (1986), and includes options for selection of the various working correlation structures they described. The PROC GENMOD procedure also allows the use of user-specified correlation structures. Two sets of standard errors were calculated as defined by Liang & Zeger (1986) and as described in Section 2: na¨ıve (or modelbased) and robust (or empirical). The PROC GENMOD procedure gives optional methods for calculation of the dispersion parameter. We report the dispersion parameter φ and scale factor √ φ estimated from the deviance/degrees of freedom, the results being very similar to those from the Pearson chi squared method proposed by Liang & Zeger (1986 p . 17). The residuals from the independence GEE model were analysed with PROC VARCOMP  in SAS. Intra-cluster correlations were calculated as σi2 / σi2 . 4. Results Rates of uptake during 1988 to 1996 varied for the different technology items that were studied. A few vessels had GPS and plotter systems in 1988 and all vessels had them by 1992. The use of otter type of trygear increased from 30% of vessels in 1988 to 73% of vessels in 1996. Towing trygear in the stern position increased from 47% of vessels in 1988 to 68% of vessels in 1996. Kort nozzles were quite common in 1988 (found on 86% of vessels) and continued to increase in use, to 99% of vessels by 1996. The commissioning of 12 customdesigned trawlers in 1994–96 and the departure of older timber vessels in 1993 also affected the fleet profile. Catches of prawns decline each year from the start of fishing in August to the end of the season in November. Different years and areas can have quite different catch rates (Figures 1 and 2). Parameter estimates from the independence model showed that effort was the main predictor of catch, as expected. After adjusting for background fluctuations in abundance, and for differences in vessel length, A-units, total headrope length of gear, and skipper experience, then hull-type, kort nozzle and GPS each affected efficiency by an important amount, of the order of 5–7% (Table 2), and efficiency increased each year following the installation of a plotter, for up to six years. The type of trygear (otter or beam) made no difference, but the position of trygear may have had a small effect. 4.1. Correlations In this dataset, correlation parameter estimates from GEE models with exchangeable and AR(1) working correlation structures in different cluster structures ranged from ρˆ = 0.25 to ρˆ = 0.47, suggesting that moderate correlations existed (Table 2). 4.2. Clusters In the simple models with small cluster sizes (clusters defined as vessel-year-area and vessel-year-month), the coefficients for the technology factors of interest were similar across correlation structures within the clusters (Table 2, a and b). c Australian Statistical Publishing Association Inc. 2000 

172

JANET BISHOP, DAVID DIE AND YOU-GAN WANG

TABLE 2 Results from GEE models with different cluster and correlation specifications (columns 1–3 within each structure type show parameter estimates (robust standard errors), ratio of robust standard errors to model-based standard errors), and correlation parameter estimates, ρˆ Independence

Exchangeable

Unstructured

AR(1)

(a) Cluster = vessel-year-area, correlations between months gear 0.41 (.030) 1.7 0.41 (.028) 1.3 A-units 0.25 (.026) 2.0 0.25 (.024) 1.6 GPS 0.05 (.009) 1.3 0.05 (.009) 1.1 plot-exp 0.02 (.002) 1.0 0.02 (.002) 1.0 kort 0.07 (.013) 1.4 0.07 (.013) 1.3 stern 0.02 (.011) 1.4 0.02 (.011) 1.2 ρˆ (range) 0 0.42

0.41 (.029) 0.25 (.025) 0.05 (.009) 0.02 (.002) 0.07 (.013) 0.02 (.011) 0.41

1.5 1.8 1.1 1.0 1.3 1.2

0.41 (.028) 1.3 0.25 (.024) 1.5 0.05 (.009) 1.0 0.02 (.002) 1.0 0.07 (.013) 1.2 0.02 (.011) 1.2 (0.27 to 0.48)

(b) Cluster = vessel-year-month, correlations between areas gear 0.41 (.024) 1.3 0.41 (.025) 1.3 A-units 0.25 (.020) 1.5 0.25 (.020) 1.4 GPS 0.05 (.008) 1.1 0.05 (.008) 1.1 plot-exp 0.02 (.002) 1.0 0.02 (.002) 1.0 kort 0.07 (.011) 1.2 0.07 (.011) 1.2 stern 0.02 (.010) 1.3 0.02 (.010) 1.3 ρˆ (range) 0 0.28

0.41 (.024) 0.25 (.020) 0.05 (.008) 0.02 (.002) 0.07 (.011) 0.01 (.010) 0.27

1.3 1.5 1.1 1.0 1.2 1.3

0.41 (.025) 1.3 0.25 (.020) 1.4 0.05 (.008) 1.1 0.02 (.002) 1.0 0.07 (.011) 1.2 0.01 (.010) 1.3 (–0.75 to 0.36)

(c) Cluster = vessel-year, correlations between month-area gear 0.41 (.039) 2.2 0.38 (.042) 1.5 A-units 0.25 (.033) 2.5 0.24 (.035) 1.8 0.05 (.012) 1.7 0.06 (.013) 1.2 GPS plot-exp 0.02 (.003) 1.5 0.02 (.003) 1.0 kort 0.07 (.017) 1.9 0.06 (.018) 1.4 stern 0.02 (.014) 1.8 0.02 (.016) 1.3 ρˆ (range) 0 0.40

0.40 (.038) 0.25 (.032) 0.05 (.012) 0.02 (.003) 0.07 (.017) 0.02 (.015) 0.47

1.7 2.0 1.3 1.5 1.5 1.5

0.41 (.039) 1.9 0.25 (.033) 2.4 0.05 (.012) 1.5 0.02 (.003) 1.5 0.07 (.017) 1.7 0.02 (.014) 1.6 (–0.51 to 0.79)

(d) Cluster = vessel-area, correlations between year-months gear 0.41 (.037) 2.1 0.35 (.036) 1.6 A-units 0.25 (.030) 2.3 0.22 (.032) 2.0 GPS 0.05 (.010) 1.4 0.03 (.009) 1.3 plot-exp 0.02 (.003) 1.5 0.02 (.003) 1.5 kort 0.07 (.016) 1.8 0.05 (.013) 1.2 stern 0.02 (.012) 1.5 0.02 (.011) 1.4 ρˆ (range) 0 0.27

0.40 (.036) 0.25 (.029) 0.05 (.010) 0.02 (.003) 0.07 (.016) 0.02 (.012) 0.35

1.7 1.9 1.3 1.5 1.6 1.3

0.38 (.033) 1.5 0.23 (.029) 1.8 0.03 (.008) 1.0 0.01 (.002) 1.0 0.06 (.013) 1.2 0.01 (.010) 1.3 (–0.17 to 0.44)

(e) Cluster = vessels, correlations between area-year-month (time within area) gear 0.41 (.055) 3.1 0.29 (.058) 2.5 not A-units 0.25 (.042) 3.2 0.18 (.045) 2.6 estimable 0.05 (.013) 1.9 0.02 (.012) 1.7 GPS plot-exp 0.02 (.004) 2.0 0.02 (.004) 2.0 kort 0.07 (.024) 2.7 0.03 (.016) 1.5 stern 0.02 (.017) 2.1 0.03 (.014) 1.8 ρˆ 0 0.25

not estimable

In the models with intermediate cluster sizes (clusters defined as vessel-year and vesselarea), the coefficients for gear length, GPS and kort nozzle varied between models with different correlation structures (Table 2, c and d), as did the coefficients for vessel length (not shown). In the model with largest cluster size (vessels), coefficients were quite unstable across correlation structures within each cluster (Table 2, e). Estimates of the impact of GPS in the first year, and of kort nozzles were most sensitive to the different correlation models. The coefficients for vessel length, kort nozzle and GPS differed most between the exchangeable model (vessel length 0.287, kort nozzle 0.033, GPS 0.016) and the independence model (vessel length 0.015, kort nozzle 0.068, GPS 0.047) within this cluster = vessels model. c Australian Statistical Publishing Association Inc. 2000 

ANALYSIS OF THE IMPACT OF NEW TECHNOLOGY ON A TRAWL FISHERY

173

TABLE 3 Analysis of residuals from GEE independence model: variance components, random effects model

vessel vessel-area vessel-month vessel-year vessel-year-area error total

Variance 25 669 13 440 1 965 29 831 23 425 150 263 244 593

Ratio to error variance 0.17 0.09 0.01 0.20 0.16 1.00

% of total variance 11 5 1 12 10 61 100

Standard errors for gear and A-units were notably greater in the cluster = vessels model compared to those in the model with the simpler cluster structure of cluster = vessel-year-area. Computation of models with unstructured working correlation matrices was not achievable for the model with cluster = vessel. According to our criteria for best cluster model, the two simplest cluster models — vessel-year-month (correlations between areas) and vessel-year-area (correlations between months) — were equally acceptable. The medium complexity vessel-year model showed some changes in parameter estimates; it was therefore less acceptable. The two most complex cluster definitions — vessel-area and vessels — produced unacceptable results. In summary, the vessel-year-area or vessel-year-month models appeared to be most likely to include the most parsimonious model. 4.3. Analysis of residuals There were three conclusions from the analysis of residuals from the independence model. First, the analysis of residuals supported the choice of one of the models with clustering, rather than none at all, because variance that could be explained by the vessels term as a random effect was not negligible (Table 3). Further, the main factors vessel, year, month and area were not the only major variation sources. Their two- and three-way interactions explain further variation, suggesting that a complicated mechanism is causing the random variation. Second, the analysis supported the choice of the models with vessels, vessel-year or vessel-year-area as clusters because they accounted for substantial variance in the residuals (Table 3). Third, the analysis of residuals helped to rule out some candidate models. The model with vessel-month as clusters accounted for negligible variance in the residuals. When vesselyear-month was included in the random effects model of residuals, the variance estimate was large and negative. These two observations lead us to reject as unsuitable both models that contained the term for months. In summary, the vessel-year-area model remained as the preferred cluster structure. 4.4. Working correlation structures The working correlation structures were investigated within the preferred vessel-yeararea model. Among the models with different correlation structures within the vessel-year-area cluster model, the coefficients, model-based standard errors and robust standard errors were similar, to two decimal places, across all correlation structures within the clusters (Table 2a). c Australian Statistical Publishing Association Inc. 2000 

174

JANET BISHOP, DAVID DIE AND YOU-GAN WANG 5 0

A R E A = G ro o te (5 )

4 0 3 0 2 0 1 0 0 -1 0 -2 0 -3 0 -4 0 -5 0

1 9 8 8

1 9 8 9

1 9 9 0

1 9 9 1

1 9 9 2

1 9 9 3

1 9 9 4

1 9 9 5

1 9 9 6

Figure 3. Pearson residuals from the preferred model for area 5 (Groote), over four months per year (August–November) for nine years. Values for the boats depicted in Figure 2 are joined within years.

The results from the three-dependent correlation structure were very similar to those from other correlation structures within this cluster. To choose the correlation structure within the vessel-year-area model, we examined the ratios of standard errors (robust to model-based). Model-based standard errors from the independence model were underestimated by as much as half (compared to robust standard errors) for A-units and vessel length. Similarly, model-based standard errors from the exchangeable model were underestimated by as much as half (compared to robust standard errors) for Aunits and vessel length (Table 2a). The unstructured correlation model had the smallest ratios, so was the best model on that basis. However, the exchangeable model gave very similar results, but with slightly larger ratios. We preferred the exchangeable model on the basis of parsimony. 4.5. Preferred model: vessel-year-area with exchangeable correlation The scale parameter was very high (8.8), indicating that serious overdispersion had to be taken into account in the analysis (as was done) to provide reasonable standard errors for the estimates. Goodness of model fits were otherwise adequate according to several criteria. There were no alarming patterns in the residuals (Figure 3), nor in the covariance matrix. Table 4 shows parameter estimates and robust standard errors from the preferred model (vessel-year-area-exchangeable). 5. Conclusion and discussion In this paper we have attempted to develop a method to find the best cluster and correlation structure to account for the variation not explained by model factors. We aimed for robust estimation, computational feasibility and a reasonable approximation of complexities. We started with simple assumptions about the cluster and correlation structures that might be appropriate, because we did not know how much variation had been accounted for already by those factors in the model that seek to describe catching unit and population characteristics. c Australian Statistical Publishing Association Inc. 2000 

ANALYSIS OF THE IMPACT OF NEW TECHNOLOGY ON A TRAWL FISHERY

175

TABLE 4 Parameter estimates, robust standard errors for main effects from the GEE model with exchangeable working correlation structure, where clusters are vessel-year-area Parameter

Estimate

Intercept Log effort

2.00 1.07

Standard error 0.104 0.004

Log gear length Log vessel length Log A-units

0.41 0.01 0.25

0.028 0.036 0.024

–0.01 0.05 –0.05 0.00

0.008 0.022 0.010 0.000

Hull Unknown New steel Timber Old steel GPS

0.05

0.009

Skipper experience with plotters (years)

0.02

0.002

Kort nozzle Present Unknown Absent

0.07 0.07 0.00

0.013 0.016 0.000

Trygear type Otter Unknown Beam type

–0.01 –0.01 0.00

0.007 0.008 0.000

Trygear position Stern Unknown Port or starboard

0.02 0.01 0.00

0.011 0.014 0.000

Skipper experience > 12 seasons 9–12 seasons 5–8 seasons < 5 seasons

–0.03 0.00 0.00 0.00

0.011 0.009 0.008 0.000

Diggle et al. (1994 p . 145) wrote: The robustness of the inferences about beta can be checked by fitting a final model using different covariance assumptions and comparing the two sets of estimates and their robust standard errors. If they differ substantially, a more careful treatment of the covariance model may be necessary.

We suggest that, unless prior information can be obtained to define the appropriate cluster and correlation model for a dataset, one should not select such a model without exploring other possible cluster and correlation structures using a systematic strategy. We found that robust estimation of parameters depended more on the choice of cluster definition than on the choice of correlation structure within the cluster. Thus we suggest that it is better to spend more effort on the choice of cluster than on the choice of correlation structure within the cluster. It is desirable to develop a practical strategy for defining and comparing models. In this analysis of fishing power, we observe that catches of prawns decline from the start of fishing to the end of the season each year. The monthly variation in catch is captured substantially by the fixed month effects in GEE model (1), leaving moderate correlation between c Australian Statistical Publishing Association Inc. 2000 

176

JANET BISHOP, DAVID DIE AND YOU-GAN WANG

catches between months in any year. Years and areas can differ substantially in catch. The results show that variations between vessel catching power in different areas are of some importance, though not as important as the vessel-year variations. Spatial and annual variations in other unmeasured predictors of catch are as important as, and interact with, the unmeasured components predicting fishing power of the vessel-crew catching unit. We acknowledge at this point that reasonable caution is required regarding our application to fisheries data. We do not have the protection against confounding that can be gained with an experimental design for data collection. We assume that the main effects in our model correctly capture the important predictors of catch, while we acknowledge that this may not be so, and we suspect there are important, unmeasured and perhaps unmeasurable determinants of catch. Pepe & Anderson (1994 p . 947) note that, to the extent that the response at one cluster can be expressed as a function of responses at other clusters, one of the conditions of robustness of GEE methods is violated. On the other hand, ‘where there is a latent random variable which induces correlation among observations at different time points’ the condition for satisfactory GEE modelling is satisfied. We must assume for now that any problems arising from inadequate design or a deficient mean model are not too serious in our data. In this analysis of fishing power, if correlations in the data were ignored, and modelbased standard errors from an independence model were used to gauge the impact of new steel boats and trygear position on vessel efficiency, then inferences such as significance tests would be misleading. Further, model-based standard errors from an independence model were underestimated by as much as half (compared to robust standard errors from the preferred model) for A-units and vessel length. On the other hand, when correlations in the data were modelled with a poor choice of cluster structure, the parameter estimates were sensitive to the choice of correlation and overinflated standard errors resulted. The shifts in the magnitude of coefficients for kort nozzle (from 0.068 in simpler models to 0.033 in the model with vessels as clusters and exchangeable correlation) and GPS (from 0.047 to 0.016) are large enough to change the management advice relying on these analyses, with considerable implications for the industry. We suspected that catches from the same vessel would be correlated, due to the likelihood of several important and unmeasured vessel and crew characteristics, especially characteristics related to crew experience or vessel specialization. At the same time, close cooperation between small groups of vessels fishing in the same area at the same time may lead to correlations in catch rates between vessels within these small groups. Nevertheless, we intuitively expected vessels as clusters would be the best model — but were surprised to find this produced the most unstable results. The instability of estimates in the models with vessels as clusters may have been partly due to the occurrence and particular pattern of within-cluster covariates in that model (Zeger & Liang, 1986; Lipsitz et al., 1994; Pepe & Anderson, 1994; Fitzmaurice, 1995). By contrast, the preferred cluster structure cluster = vessel-year-area featured a large number of clusters of small dimension, and there were no within-cluster covariates. To sum up, the problems of unstable parameter estimates and over- or underestimates of standard errors can be avoided, and sometimes precision can be gained, by specifying a cluster and correlation model that adequately reflect the main features of the correlations in the data. We think that we have modelled a reasonable approximation of the complexities of these prawn catch data. More complex models could have been tried, but we believe that there is little to be gained by further modelling of the correlation structure. We found that the GEE method was c Australian Statistical Publishing Association Inc. 2000 

ANALYSIS OF THE IMPACT OF NEW TECHNOLOGY ON A TRAWL FISHERY

177

computationally efficient and allowed exploration of several models that were computationally prohibitive in other analysis procedures. Overall, the GEE approach gave a practical option to cope with non-normal and correlated data such as are found in observational studies in fisheries science and in many other disciplines related to natural resources management, where spatial and temporal processes influence the outcomes. References ALBERT, P.S. & McSHANE, L.M. (1995). A generalized estimating equations approach for spatially correlated binary data: applications to the analysis of neuroimaging data. Biometrics 51, 627–638. BALEMI, A. & LEE, A. (1999). Some properties of the Liang–Zeger method applied to clustered binary regression. Aust. N. Z. J. Stat. 41, 43–58. BISHOP, J. & STERLING, D. (1999). Survey of Technology Utilisation in the Northern Prawn Fishery Fleet. Australian Fishery Management Authority, Canberra, Australia. CHEN, J.J. & AHN, H. (1997). Marginal models with multiplicative variance components for overdispersed binomial data. J. Agricultural, Biological and Environmental Statist. 2, 440–450. CRESSIE, N.A.C. (1993). Statistics for Spatial Data. New York: J. Wiley & Sons. DANN, T. & PASCOE, S. (1994). A Bioeconomic Model of the Northern Prawn Fishery. Technical Report. ABARE Research Report 94, 13. Canberra, Australia. DIGGLE, P.J., LIANG, K.Y. & ZEGER, S.L. (1994). Analysis of Longitudinal Data. Oxford: Oxford Science Publications. FITZMAURICE, G.M. (1995). A caveat concerning independence estimating equations with multivariate binary data. Biometrics 51, 309–317. HILBORN, R. & WALTERS, C.J. (1992). Quantitative Fisheries Stock Assessment: Choice, Dynamics and Uncertainty. London: Chapman & Hall. LIANG, K.L. & ZEGER, S. (1986). Longitudinal data analysis using generalized linear models. Biometrika 73, 13–22. LIANG, K.L., ZEGER, S. & QAQISH, B. (1992). Multivariate regression analysis for categorical data (with discussion). J. Roy. Statist. Soc. Ser. B 45, 3–40. LIPSITZ, S.R., KIM, K. & ZHAO, L. (1994). Analysis of repeated categorical data using generalized estimating equations. Statistics in Medicine 13, 1149–1163. LUMLEY, T. (1996). Generalized estimating equations for ordinal data: a note on working correlation structures. Biometrics 52, 354–361. LUMLEY, T. & HEAGERTY, P.J. (1999). Weighted empirical adaptive variance estimators for correlated data regression. J. Roy. Statist. Soc. Ser. B 61, 459–477. O’HARA HINES, R.J. (1997). Analysis of clustered polytomous data using generalized estimating equations and working covariance structures. Biometrics 53, 1552–1556. PEPE, M.S. & ANDERSON, G.L. (1994). A cautionary note on inference for marginal regression models with longitudinal data and general correlated response data. Comm. Statist. B — Simulation Comput. 23, 939–951. ROBINS, C.M., WANG, Y.G. & DIE, D. (1998). The impact of global positioning systems and plotters on fishing power in the Northern Prawn Fishery, Australia. Canad. J. Fish. Aquat. Sci. 55, 1645–1651. SAS Institute, Inc. (1997). SAS/STAT Software: Changes and Enhancements Through Release 6.12. Cary, North Carolina: SAS Institute, Inc. SOMERS, I.F. (1994). Species composition and distribution of commercial penaeid prawn catches in the Gulf of Carpentaria, Australia, in relation to depth and sediment type. Austral. J. Mar. Freshw. Res. 45, 317–335. SUTRADHAR, B.C. & DAS, K. (1999). On the efficiency of regression estimators in generalized linear models for longitudinal data. Biometrika 86, 459–465. WANG, Y.G. & DIE, D. (1996). Stock-recruitment relationships of the tiger prawns (Penaeus esculentus and Penaeus semisulcatus) in the Australian Northern Prawn Fishery. Austral. J. Mar. Freshw. Res. 47, 87–95. ZEGER, S.L. & LIANG, K.Y. (1986). Longitudinal data analysis for discrete and continuous outcomes. Biometrics 42, 121–130. ZEGER, S.L. & LIANG, K.Y. (1992). An overview of methods for the analysis of longitudinal data. Statistics in Medicine 11, 1825–1839.

c Australian Statistical Publishing Association Inc. 2000