Time Series Autoregressive Integrated Moving Average Modeling of. Test-Day Milk Yields of Dairy Ewes. N.P.P. Macciotta, A. Cappio-Borlino, and G. Pulina.
Time Series Autoregressive Integrated Moving Average Modeling of Test-Day Milk Yields of Dairy Ewes N.P.P. Macciotta, A. Cappio-Borlino, and G. Pulina Dipartimento di Scienze Zootecniche, Universita` degli Studi di Sassari, Via De Nicola 9, 07100 Sassari, Italy
ABSTRACT Monthly test-day milk yields of 1200 dairy Sarda ewes were analyzed by time-series methods. Autocorrelation functions were calculated for lactations within parity classes and altitude of location of flocks. Spectral analysis of the successions of data was developed by Fourier transformation, and different Box-Jenkins autoregressive integrated moving average models were fitted. The separation of deterministic and stochastic components highlighted the autoregressive feature of milk production pattern. The forecasting power of autoregressive integrated moving average models was tested by predicting total milk production for a standardized lactation length of 225 d from only a few testday records. Results indicated a greater forecasting capacity in comparison with standard methods and suggested further development of time-series analysis for studying lactation curves with more sophisticated methods, such as wavelet decomposition and neural network models. (Key words: test day, ewe lactation curves, time-series analysis) Abbreviation key: ACF = autocorrelation function, ARIMA = autoregressive integrated moving average, ARMA = autoregressive moving average, PACF = partial autocorrelation function, TDY = test-day yield(s). INTRODUCTION The evolution of milk production along lactation is an interesting topic for both breeding and management purposes. In lactation studies, usually test-day yields (TDY) recorded at regular intervals of time from parturition (DIM) are analyzed. It is possible to consider TDY measured at different DIM intervals as genetically different variables, and, consequently, data can be analyzed with multivariate methods (18). However, according to the univariate approach, which is the most
Received April 26, 1999. Accepted November 24, 1999. Corresponding author: N.P.P. Macciotta; e-mail: macciott@ ssmain.UNISS.IT. 2000 J Dairy Sci 83:1094–1103
common, milk yields taken at different times along each lactation can be considered as repeated measures on the same experimental unit (26). When the main interest lies in the lactation curve, i.e., in the continuous and regular component of milk yield evolution, TDY along each lactation are modeled by suitable mathematical functions (24, 31). Most of the studies dealing with mathematical models of lactation curves refer to dairy cows, but several functions have also been fitted to dairy sheep and goat milk production (2, 4, 8, 13, 21, 22). In TDY analysis developed by continuous functions of time, environmental effects are assumed to average out over lactation, even if there can be effects peculiar to each TDY that may not average out (15). This problem can be solved by modeling TDY directly, taking into account factors that could affect each TDY differently. Thus, Stanton et al. (25) analyzed TDY with mixed linear models, reconstructing lactation curves of dairy cows by obtaining the least squares estimates of the fixed effects of a DIM factor on milk yield (19, 27). Similar linear mixed models have been applied to the TDY of dairy sheep (3, 5). Both approaches to TDY data have been brought into one perspective by Ptak and Schaeffer (23), who suggested a repeatability animal model in which single TDY were treated as repeated measures and factors to model the lactation curve were included. A serious drawback of lactation curve modeling by continuous functions of time is that residuals are to be treated only as a noise component. When such models are used for forecasting purposes, predictions are based on the deterministic component, whereas residuals give information only about the magnitude of the error variance. However, (co)variance components associated with random factors and residuals represent a relevant problem with mixed linear models. When such models are applied to genetic evaluations, the (co)variance matrices are structured according to the relationships among animals, but modeling of repeated measures at phenotypic level should require special care over the covariance structure, because of the sequential nature of the data on each animal (11). In most of the models suggested for TDY analysis, this problem is solved by imposing a common correlation among all TDY of the same animal, even though measures close in time are
1094
1095
MODELING OF EWE TEST DAY YIELDS
often more correlated than measures far apart in time; moreover, variances of repeated measures can change with time. These potential patterns of correlation and variation may combine to produce a complicated structure of covariance among TDY that, when ignored, may result in inefficient analysis or in incorrect conclusions (17). This paper proposes a different approach to test-day data, based on time-series analysis. Our first aim was to check the suitability of time-series methodologies, and particularly of Box-Jenkins autoregressive integrated moving average (ARIMA) modeling (1), as a tool to recognize the main structural components of time evolution of milk yield, such as the shape of lactation curve and the pattern of (co)variance of residuals. A second, more technical aim was to test the usefulness of ARIMA models to reduce the number of milk records needed to estimate total yield for standardized lactation length by forecasting missing data on the basis of few available TDY recorded at the beginning of lactation.
Theoretical Basis of Time-Series Analysis
Consider the result of applying the lag operator twice to a series: B(BZt) = B(Zt−1) = (Zt−2). Such a double application is indicated by B2, and, in general, for any integer k, it can be written Bk (Zt) = Zt−k.
Zt − φ1Zt−1 − φ2Zt−2 − ... − φpZt−p = εt = φ(B)Zt
A time series is a set of values of a continuous casual variable Y (Y1, Y2, ..., Yn), ordered according to a discrete index variable t (1, 2, ..., n). The term time-series comes from econometric studies in which the index variable refers to intervals of time measured in a suitable scale. However, it must be clearly stated that this direct reference to time is not required: actually, any different meaning can be attributed to the index variable, provided that it is able to order the Y values (14). In general, in a given time series the following can be recognized and separated (12, 16): 1) a regular, longterm component of variability, termed trend, that represents the whole evolution pattern of the series; 2) a regular, short-term component whose shape occurs periodically at intervals of s lags of the index variable, currently known as seasonality, because this term is also derived by applications in economics; 3) an AR(p) autoregressive component of p order, which relates each value Zt = Yt − (trend and seasonality) to the p previous Z values, according to the following linear relationship
[3]
where φ(B) is the autoregressive operator of p order defined by φ(B) = 1 − φ1B − φ2B2 − ... − φpBp. Similarly, Equation [2] can be written as Zt = εt − Θ1εt−1 − Θ2εt−2 − ... − Θqεt−q = Θ(B) εt [4] where Θ(B) indicates the moving average operator of q order defined by Θ(B) = 1 − Θ1B − Θ2B2 − ... − ΘqBq. The autoregressive and moving average components can be combined in an autoregressive moving average (ARMA) (p,q) model Zt = φ1Zt−1 + φ2Zt−2 + ... + φpZt−p + εt + Θ1 εt−1 + Θ2 εt−2 + ... + Θq εt−q
[1]
where φi (i = 1, ..., p) are parameters to be estimated and εt is a residual term; and 4) a MA(q) moving average component of q order, which relates each Zt value to the q residuals of the q previous Z estimates Zt = εt − Θ1εt−1 − Θ2εt−2 − ... − Θqεt−q
BZt = Zt−1.
By using the backward operator, Equation [1] can be rewritten as
MATERIALS AND METHODS
Zt = φ1Zt−1 + φ2Zt−2 + ... + φpZt−p + εt
where Θi (i = 1, ..., q) are parameters to be estimated. The theory of time-series analysis has developed a specific language and a set of linear operators. According to Box and Jenkins (1), a highly useful operator in time-series theory is the lag or backward linear operator (B) defined by
or in lag operator form (1 − φ1B − φ2B2 − ... − φpBp) Zt = (1 − Θ1B − Θ2B2 − ... − ΘqBq) εt,
[2] finally, Journal of Dairy Science Vol. 83, No. 5, 2000
1096
MACCIOTTA ET AL.
φ(B) Zt = Θ(B) εt and Zt =
Θ(B) ε φ(B) t
[5]
In a preliminary analysis of a series it is useful to independently evaluate the long- and short-term periodic components, which are essential to define the regular structure of the series. The trend component can be evaluated by fitting a regular function, a polynomial, or a more complicated general function. The seasonal component can be estimated by a seasonal decomposition procedure, which calculates a seasonal index based on the ratio of the observed values to the moving average. In the final stage of series modeling, however, both the trend and the seasonal component will be integrated in the ARMA(p,q) process (1). For the trend, such an integration is obtained by using the difference linear operator (∇), defined by ∇Yt = Yt − Yt−1 = Yt − B Yt = (1-B) Yt. A single application of the ∇ operator corrects the data for a linear increasing trend, whereas its repeated use for d times corrects for a trend that can be fitted by a d-order polynomial. The stationary series Zt obtained as the dth difference (∇d) of Yt, Zt = ∇d Yt = (1-B)d Yt, can be then modeled by an ARMA (p,q) process. The combined use of the ∇ operator and the ARMA (p,q) process results in an ARIMA (p,d,q) model. Furthermore, ARIMA can account for the seasonal component of s lag period, by using both correlations between Zt and Zt−s values and those between the corresponding residuals εt and εt−s. In mathematical terms, therefore, a seasonal ARIMA model is an ARIMA (p,d,q) model whose residuals εt can be further modeled by an ARIMA(P,D,Q)s structure with linear operators (P,D,Q) being functions of the Bs operator. The operators of a seasonal ARIMA model, defined as (p,d,q) × (P,D,Q)s, can be expressed as follows: AR(p) nonseasonal operator of p order, φ(B) = 1 − φ1B − ... − φpBp; AR(P) seasonal operator of P order, Φ(B) = 1 − Φ1Bs − ... − ΦPBsP; MA(q) nonseasonal operator of q order, Θ(B) = 1 − Θ1B − ... − ΘqBq; MA(Q) seasonal operator of Q order, Θ(B) = 1 − Θ1Bs − ... − ΘQBsQ; and difference operator of d order, ∇d = (1-B)d. The series analyzed in this study was without any defined trend, so only main properties of ARIMA (p,0,q) × (P,0,Q)s will be considered. The Box-Jenkins methodology (1) for analyzing and modeling time series is characterized by three steps: 1) Journal of Dairy Science Vol. 83, No. 5, 2000
model identification, 2) parameter estimation, and 3) model validation. Model identification defines the (p,d,q) orders of the AR and MA components, both seasonal and nonseasonal. In this step, fundamental analytical tools are the spectral analysis of the Fourier transform of the original series and the autocorrelation functions (20). The spectral analysis is based on the property that any succession of values ordered by an index variable can be obtained as the sum of periodic elementary waves with different angular frequencies (ωi) n
Zt =
∑ (ai sen ωit + bi cos ωit).
[6]
1
Results of spectral analysis are usually summarized in a periodogram function that plots the average squared amplitudes (ai2 + bi2) of the elementary waves against the corresponding frequencies. Peaks of amplitude for well-defined frequencies are evidence of deterministic components of the variability in the original series, whereas a continuous pattern of the periodogram refers to the nondeterministic components. In particular, a periodogram that shows only isolated and discrete peaks underlines a deterministic process, whose future values can be predicted easily from the data. A periodogram nearly parallel to the frequency axis is evidence of processes that are random and unpredictable (i.e., white noise processes). The repeated occurrence of welldefined peaks at equally spaced intervals of frequency highlights the existence of a seasonal component. The functions of autocorrelation (ACF) and partial ACF (PACF) are equally important for the definition of the internal structure of the analyzed series. The ACF ρ(k) at lag k of the Zt series is the linear correlation coefficient between Zt and Zt−k, calculated for k = 0,1,2... ρ(k) =
Cov(Zt,Zt − k)
. Var(Z )Var(Z ) t t−k √
The PACF is defined as the linear correlation between Zt and Zt−k, controlling for possible effects of linear relationships among values at intermediate lags (18). Theoretically, both an AR(p) process and an MA(q) process should be associated with well-defined patterns of ACF and PACF, usually decreasing exponential or alternate in sign or decreasing sinusoidal patterns. A precise correspondence between ARMA (p,q) processes and defined ACF and PACF patterns is more difficult to recognize. When the order of at least one of the two components (AR or MA) is clearly detectable, however, the other can be identified by attempts in the following
1097
MODELING OF EWE TEST DAY YIELDS
step of parameter estimation. Finally, the existence of a seasonal component of length s is underlined by the presence of a periodic pattern of period s in the ACF. Once a suitable ARIMA (p,0,q) × (P,0,Q)s structure is identified, subsequent steps of parameter estimation and model validation must be performed. Parameter estimates are usually obtained by maximum likelihood, which is asymptotically correct for time series. Estimators are usually sufficient, efficient, and consistent for Gaussian distributions and are asymptotically normal and efficient for several non-Gaussian distribution families. Validation of the goodness of fit of an ARIMA model can be developed according to the following steps: 1) evaluation of statistical significance of parameters by the usual comparison between the parameter value and the standard deviation of its estimate. For a test statistic that is valid only asymptotically, a parameter whose value exceeds twice its standard error can be considered significant. 2) Analysis of the ACF of residuals. In this step, residuals (εt) are considered as a new time series, and ACF and PACF are estimated to be sure that values at lag k > 0 are not statistically different from zero. 3) Test of the white noise connotation of residuals, particularly by means of the Kolmogorov test, which compares the integrated periodogram (obtained as the normalized cumulative periodogram of residuals) with the theoretical periodogram of a white noise process (20). In general, the rejection of the hypothesis that the series of residuals represents a white noise process also gives indications whether to define an alternative model that will be subjected to identification, estimation, and validation steps. The general goodness of fit of an ARIMA model can be tested by R2 = 1 −
Var(εt) Var(Zt)
[7]
by analogy to the determination coefficient of the multiple regression analysis. An ARIMA model such as [5] that fits the time-series analysis well can identify and disentangle the different structural components of the series itself, and it can also predict values of the series. For prediction purposes, ARIMA models are different from the analytical functions of time: Zt = f(t), because ARIMA forecasting uses previous values of the series and errors in the previous estimates. Actually, this peculiarity of ARIMA forecasting is valid in the short term because parameters of the model cannot account, in the long term, for changes in the dynamics of the series.
Table 1. Structure of data in each succession.
Ewe
Milk test-day yields
Index
1 1 1 1 1 1 1 2 2 2 2 2 2 2 3 3 3 3 3 3 3 ... ... 100 100
1 2 3 4 5 6 7 1 2 3 4 5 6 7 1 2 3 4 5 6 7 ... ... 6 7
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 ... ... 699 700
Building Time Series by Test-Day Yields, Data Analysis, and Forecasting To fit an ARIMA model requires a sufficiently large data set. This requirement seems to exclude the use of time-series methodology for the analysis of data coming from dairy recording programs (on average 10 to 12 TDY for cattle and 6 to 8 TDY for ewes, respectively), restricting its use to situations in which all TDY within each lactation are available (10). This constraint, however, can be removed by constructing a time series whose index variable does not have an immediate reference to time. In this study, raw data were test-day milk yields of Sarda dairy ewes, organized in six archives (or successions) of 200 lactations each. Three of them were created by grouping lactations on the basis of parity (third, fourth, and fifth), and three were created on the basis of the levels of altitude of farms (plain, hill, and mountain). Lactations were randomly chosen from a larger data set recorded by the Breeders Association of Sassari, Italy, with the constraint of seven TDY recorded at intervals of about 30 d for each lactation. Within each succession, data were structured as follows (Table 1): lactations of different ewes were arranged sequentially in a vector (1, ..., 200), and within each ewe TDY were ordered according to their distance from lambing (1, ..., 7). The TDY series thus obtained were ordered by an index variable that no longer had a meaning of Journal of Dairy Science Vol. 83, No. 5, 2000
1098
MACCIOTTA ET AL.
time because, for example, the value 4 for the index corresponded to the fourth TDY of the first ewe of the succession, but the value 20 for the index corresponded to the sixth TDY of the third ewe ((7 × 2) + 6 = 20) of the succession. As reported in the previous section, however, this series can be analyzed by ARIMA models because of the correspondence between the index variable and the succession of TDY. Obviously, this peculiar meaning assigned to the index variable implies a parallel redefinition of all fundamental concepts of time-series theory. In particular, a possible trend should be interpreted as a regular tendency of milk yield to increase (or to decrease) along the whole succession of lactations; the seasonal component, i.e., the amount of variability whose shape repeatedly occurred within a well-defined period, corresponded to the average pattern of the lactation curve; and finally, the AR(p) and MA(q) components underline linkages between one record and the record (or residual); that is, p (or q) lags forward, either within the same lactation or in the previous one. To define the most appropriate ARIMA model, to estimate parameters, and to evaluate goodness of fit, the first 100 lactations of each succession were used (base series). The ability of estimated models to forecast was then tested on the remaining 100 lactations (control series). Two situations of incomplete lactations were simulated for each control series: only the first two available TDY of each lactation or only the first four available TDY. The ARIMA models fitted to the corresponding base series were then used to predict the remaining five or three missed TDY, respectively. Consequently, three subsets of data were available for each ewe of the control series: all seven actual TDY, two actual and five predicted TDY, and four actual and three predicted TDY. Total milk yield for a standardized lactation length of 225 d was then calculated for each ewe of the control series using the Fleischmann methodology (9) on the three subsets of TDY. The result was: an actual production (AY) with all seven actual TDY; an estimated production with two actual and five predicted TDY (FY2), with the last actual TDY recorded at about 65 DIM; and an estimated production with four actual and three predicted TDY (FY4), with the last actual TDY recorded at approximately 125 DIM. The goodness of fit of the time-series predictions was assessed by comparing actual and predicted productions. Criteria for comparison were Pearson and rank correlations between actual and predicted productions, the difference between their standard deviations, the standard deviation of differences between actual and predicted, the prediction bias; and the ratio between the standard deviation of differences between actual and predicted production and the mean value. Journal of Dairy Science Vol. 83, No. 5, 2000
Figure 1. Pattern of raw data of the succession.
RESULT AND DISCUSSION Figure 1 shows the pattern of raw data in the base series built up with lactations of ewes belonging to farms located on the plain. Defined trend cannot be recognized, regardless of the order of the ewes in the succession; thus, difference operators (d and D) were set to zero. All figures reported in the text are referred to ewes on the plain, but comparable ones have been obtained for all of the other data sets. Results of Fourier transformation, reported in the periodogram of Figure 2, clearly show the occurrence of well-defined peaks at equally spaced intervals of the angular frequencies i = n × 2π/T, where T is the period of seven lags of the index variable. This pattern underlines the existence of a periodic deterministic component that occurs in all lactations and that can be identified with the amount of variability explained by the average lactation curve of the succession. For the seven lags of each period, the seasonal decomposition procedure is able to calculate the seasonal indexes reported
Figure 2. Periodogram of result of Fourier transformation of the succession. The wave intensity is expressed in an arbitrary scale. The angular frequency (ω) corresponds to n × 2π/T, where T is the lag.
1099
MODELING OF EWE TEST DAY YIELDS
Figure 3. Pattern of the seasonal component of the succession.
in Figure 3. By transformation of such indexes into values of milk yield, the lactation curve of ewes of plain farms is obtained and is reported in Figure 4 with the lactation curves of ewes of mountain and hill farms obtained with the same procedure. The shape of lactation curves is in agreement with results of previous studies on dairy ewes in the Mediterranean areas (3, 5). Furthermore, the periodogram shows several lessdefined peaks, located at low angular frequencies, that evidence a residual deterministic linkage among TDY within each lactation. This component can be modeled by an AR(p), an MA(q), or an ARMA(p,q) process. Autocorrelation functions (Figures 5 and 6) further confirm results of spectral analysis for the identification of a suitable ARIMA structure. The relevance of the periodic component of the variability (average lactation curve) is confirmed by the values of ACF and PACF, significantly different from zero at lags of multiples of seven. Such a component can be modeled with a sea-
Figure 4. Average lactation curves of the three levels of altitude. Key: 䊏 = plain, ▲ = hill, and 䊉 = mountain.
Figure 5. Autocorrelation function of the original data of the succession. Dotted lines are plotted at zero plus and minus twice the standard errors for each coefficient.
sonal ARIMA (1,0,1) with a period of seven lags. Moreover, the ACF shows a sinusoidal decreasing pattern that is peculiar of autoregressive processes of the second order (AR(2)) (14). On the basis of these considerations, the most suitable model for the successions under study seems to be an ARIMA (2,0,0) × (1,0,1)7 structure that, in terms of the linear operators described in the previous section, is Zt = +
(1 − Θ1B7) (1 − φ1B − φ2B2)(1 − Φ1B7)
[8]
where is the overall mean; Θ1 is the MA(1) seasonal operator; φ1 and φ2 are the AR(1) and AR(2) nonseasonal operators; Φ1 is the AR(1) seasonal operator; and B is the backward operator, and the meanings of the operators are as described in the previous section. Results of estimation are reported in Tables 2 and 3. Goodness of fit of the proposed ARIMA structure is
Figure 6. Partial autocorrelation function of the original data of the succession. Dotted lines are plotted at zero plus and minus twice the standard errors for each coefficient. Journal of Dairy Science Vol. 83, No. 5, 2000
1100
MACCIOTTA ET AL. Table 2. Results of autoregressive integrated moving average model estimates for the three farm locations. Plain
Hill
Mountain
Parameter
Estimate
SE
T1
Estimate
SE
T
Estimate
SE
T
Mean ARI2 Nonseasonal AR23 Nonseasonal AR4 Seasonal MA5 Seasonal Residual variance Orginal variance
1.545 0.494 0.055 0.995 0.711 0.084 0.450
0.146 0.036 0.042 0.006 0.028
10.61 13.50 1.32 176.18 25.04
1.398 0.629 0.072 0.982 0.658 0.076 0.325
0.174 0.034 0.044 0.009 0.032
8.05 18.43 1.66 113.10 20.63
1.167 0.346 0.054 0.988 0.723 0.050 0.181
0.085 0.038 0.040 0.007 0.029
13.70 9.19 1.35 135.28 24.99
T = t-ratio. Autoregressive parameter of the first order. 3 Autoregressive parameter of the second order. 4 Autoregressive parameter. 5 Moving average parameter. 1 2
evidenced by the statistical significance of all parameters for all successions, except for the AR2 operator that, in any case, is often near the level of significance. The amount of original variability explained by the model, as measured by [7], averaged 75%. Finally, the capability of the model to take account of all relevant deterministic components is highlighted by the absence of significant values in ACF and PACF of residuals (Figures 7 and 8) and by the results of Kolmogorov statistic. This finding indicates a probability of residuals to be a white noise process greater than 0.99 for all successions. The main results of the comparison between actual and predicted total milk yields, with reference to lactations belonging to different farm locations and to different parity stages, are reported in Tables 4 and 5. Values of Pearson and rank correlations between actual and predicted yields are remarkable, even if predictions are obviously better when four of seven TDY are known (about 0.92 and 0.98, respectively). Random changing of the order of lactations, in the basis and control succes-
sions, did not result in any sensible variation of the predictions. Standard deviation of estimates and, above all, the difference between standard deviation of actual and predicted productions suggest further considerations. In the usual methods for prediction of total milk yield, the cumulated yield at the DIM at which the last available TDY occurs is projected to the standardized lactation length by the use of a multiplicative coefficient that varies according to some relevant factors, such as breed, age, parity stage, and month of lambing (6). As a consequence, ewes that have the same cumulated milk yield at the last recorded TDY and that are of the same breed and season of lambing will have the same total milk yield, thus resulting in an artificial compression of variance that presumably will affect genetic evaluations. With the ARIMA predictions, single TDY rather than projected cumulated yields are estimated, resulting in variance of estimates of the same order of actual variances.
Table 3. Results of autoregressive integrated moving average model estimates for the three levels of parity. Third
Fourth 1
Fifth
Parameter
Estimate
SE
T
Estimate
SE
T
Estimate
SE
T
Mean AR11 Nonseasonal AR23 Nonseasonal AR4 Seasonal MA5 Seasonal Residual variance Original variance
1.204 0.500 0.165 0.995 0.733 0.092 0.368
0.164 0.041 0.046 0.007 0.031
7.34 12.29 3.58 145.77 23.50
1.305 0.532 0.089 0.986 0.686 0.066 0.295
0.139 0.040 0.046 0.008 0.034
9.41 13.44 1.91 118.38 20.40
0.973 0.425 0.072 0.987 0.672 0.060 0.190
0.115 0.043 0.048 0.011 0.037
8.49 9.83 1.51 90.70 18.08
T = t-ratio. Autoregressive parameter of the first order. 3 Autoregressive parameter of the second order. 4 Autoregressive parameter. 5 Moving average parameter. 1 2
Journal of Dairy Science Vol. 83, No. 5, 2000
1101
MODELING OF EWE TEST DAY YIELDS
Figure 7. Autocorrelation function of residuals of autoregressive integrated moving average modeling the succession. Dotted lines are plotted at zero plus and minus twice the standard errors for each coefficient.
Figure 8. Partial autocorrelation function of residuals of autoregressive integrated moving average modeling the succession. Dotted lines are plotted at zero plus and minus twice the standard errors for each coefficient.
The magnitude of the prediction bias is negligible and, more important, of different sign in the different classes, indicating the absence of any tendency of the model to over or under estimate TDY. Finally, the last index, obtained as 100 times the ratio of the standard deviation of the difference between actual and predicted productions, and usually proposed as a measure of the goodness of predictions, was about 8.5 (except for ewes in the third lambing) for forecasted total yield with two actual test-day records, whereas it was about 4.5 for forecasted total yield with four actual test-day records. These results are comparable with those obtained in dairy cattle (30), thus confirming the good forecasting power of ARIMA modeling.
random successions of several lactations similar in some properties) was aimed to call attention to the usefulness of joining up more traditional analysis of milk yield evolution with alternative methods that have been developed within the context of adaptive signal processing and forecasting. Relevance of such methods results from their ability to recognize continuous and regular patterns of the whole variability of processes, which have both deterministic and random components, and mainly to take account of possible linkages that still remain among residuals. Results obtained in this work demonstrate that ARIMA models are able to identify the shape of the average lactation curve and to highlight the autoregressive nature of TDY residuals within each lactation. Such a conclusion is in agreement with many instances occurring in the literature in which low order autoregressive processes have been proposed as possible structures for modeling milk yield evolution (7, 28, 29).
CONCLUSIONS The method suggested in this paper to construct ordered series from test-day milk yields (by making up
Table 4. Statistics of the predictions for the three farm locations. 65 DIM Parameter 1
AY (kg) FY2 (kg) Correlation (AY, FY) Rank correlation (AY, FY) σ (FY) σ (AY) − σ (FY) Mean (AY–FY) σ(AY–FY) σ(AY − FY) × 100 AY
125 DIM
Plain
Hill
Mountain
Plain
Hill
Mountain
325 330 0.93 0.92 67.17 5.51 −4.64 26.90
296 299 0.92 0.92 56.20 7.77 −3.32 25.71
238 234 0.91 0.87 42.78 7.41 3.90 20.80
325 328 0.98 0.98 68.53 4.16 −2.45 15.21
296 294 0.98 0.98 63.14 0.83 1.30 13.23
238 236 0.99 0.97 45.84 4.35 1.95 8.86
8.16
8.60
8.88
4.64
4.50
3.75
1
Actual yield. Forecasted yield.
2
Journal of Dairy Science Vol. 83, No. 5, 2000
1102
MACCIOTTA ET AL. Table 5. Statistics of the predictions for the three parity stages. 65 DIM
125 DIM
Parameter
3rd
4th
5th
3rd
4th
5th
AY1 (kg) FY2 (kg) Correlation (AY, FY) Rank correlation (AY, FY) σ (FY) σ (AY) − σ (FY) Mean (AY–FY) σ(AY–FY) σ(AY − FY) × 100 AY 1 Actual yield. b Forecasted yield.
280 283 0.91 0.90 54.35 12.49 −2.73 28.51
280 283 0.93 0.92 55.57 7.32 −2.22 23.69
227 224 0.94 0.91 45.05 8.73 2.52 18.86
280 281 0.98 0.98 64.57 2.27 −1.33 13.16
280 278 0.98 0.98 61.00 1.89 2.29 11.17
227 225 0.98 0.98 50.75 3.04 1.42 9.69
10.08
8.38
8.40
4.68
4.02
4.30
Furthermore, ARIMA models provide a simple and flexible tool to forecast lacking TDY values, thus obtaining sufficient data to estimate the whole production for a standardized lactation length with the same accuracy obtained with more complicated methods of prediction. Accuracy could be further improved by adopting a more selective criterion for the construction of more homogeneous groups of lactations. Such a property of ARIMA modeling could be very useful to increase the impact of selection programs in populations such as the Sarda dairy ewe, one of the most important sheep breeds of the Mediterranean areas (5,000,000 head). The critical point is represented by the small number of recorded animals (at present, 2% of the whole population) because of the high expenses for official controls (about 12 times more expensive than those for dairy cattle). Adaptive signal processing and its application in time-series prediction will become more important in the future, when dairy animals may have much longer official test intervals but much more frequent records by the farm owner. More farms will have automated milk recording systems that could provide data throughout the lactation, both on milk yield and on several physical and chemical properties associated with milk quality. To analyze and to decode such a complex net of information, more powerful and sophisticated tools of signal processing will be necessary to manage several time series of different variables at the same time and with different recording frequencies. Methods based on cross correlations among time series, artificial neural networks, and wavelet decomposition seem particularly appealing for these purposes. ACKNOWLEDGMENTS The authors thank R. L. Quaas for his useful comments and suggestions to improve the work and C. Zanolla for her technical support. This work was funded Journal of Dairy Science Vol. 83, No. 5, 2000
by Ministero dell’Universita` e della Ricerca Scientifica e Tecnologica (grant 60%). REFERENCES 1 Box, G.E.P., and G. M. Jenkins. 1970. Time series analysis: forecasting and control. Holden Day, San Francisco, CA. 2 Cappio-Borlino, A., N.P.P. Macciotta, and G. Pulina. 1997. The shape of Sarda ewe lactation curve analyzed with a compartmental model. Livest. Prod. Sci. 51:89–96. 3 Cappio-Borlino, A., B. Portolano, M. Todaro, N.P.P. Macciotta, P. Giaccone, and G. Pulina. 1997. Lactation curves of Valle del Belice dairy ewes for milk, fat and protein estimated with test day models. J. Dairy Sci. 80:3023–3029. 4 Cappio-Borlino, A., G. Pulina, and G. Rossi. 1995. A non-linear modification of Wood’s equation fitted to lactation curves of Sardinian dairy ewes. Small Rum. Res. 18:75–79. 5 Carta, A., S. R. Sanna, and S. Casu. 1995. Estimating lactation curves and seasonal effects for milk, fat, and protein in Sarda dairy sheep with a test day model. Livest. Prod. Sci. 44:37–44. 6 Carta, A., S. R. Sanna, A. Rosati, and S. Casu. 1998. Milk yield adjustments for milking length and age-parity-lambing month interaction in Sarda dairy sheep. Ann. Zootech. 47:59–66. 7 Carvalheira, J.G.V., R. W. Blake, E. J. Pollak, R. L. Quaas, and C. V. Duran-Castro. 1998. Application of an autoregressive process to estimate genetic parameters and breeding values for daily milk yield in a tropical herd of lucerna cattle and in United States Holstein herds. J. Dairy Sci. 81:2738–2751. 8 Chang, Y. M. 1999. Bayesian analysis of lactation curves in Dairy sheep. M.S. thesis, University of Wisconsin, USA. 9 Commitee International par les Controles des performances en Elevage. 1992. Reglement international pour le controle laitiere ovin. ICAR, Rome, Italy. 10 Deluyker, H. A., R. H. Shumway, W. E. Wecker, A. S. Azari, and L. D. Weaver. 1990. Modelling daily milk yield in holstein cows using time series analysis. J. Dairy Sci. 73:539–548. 11 Diggle, P. J. 1988. An approach to the analysis of repeated measurements. Biometrics 44:959–971. 12 Fuller, F. C., Jr., and C. P. Tsokos. 1971. Time series analysis of water pollution data. Biometrics 27:1017–1034. 13 Groenwald, P.C.N., A. V. Ferreira, H. J. Van der Merwe, and C. J. Slippers. 1995. A mathematical model for describing and predicting the lactation curve of Merino ewes. Anim. Sci. 61:95–101. 14 Hamilton, J. D. 1994. Time series analysis. Princeton University Press, Princeton NY. 15 Jamrozik, J., and L. R. Schaeffer. 1997. Estimates of genetic parameters for a test day model with random regressions for yield traits of first lactation holsteins. J. Dairy Sci. 80:762–770.
MODELING OF EWE TEST DAY YIELDS 16 Kendall, M. G., and A. Stuart. 1966. The advanced theory of statistics. Vol. 3. Design and Analysis and Time-Series. Charles Griffin & Co. Ltd., London, United Kingdom. 17 Littel, R. C., P. R. Henry, and C. B. Ammermann. 1998. Statistical analysis of repeated measures data using SAS procedures. J. Anim. Sci. 76:1216–1231. 18 Morrison, D. F. 1967. Multivariate Statistical Methods. McGrawHill, Inc., New York, NY. 19 Pander, B. L., W. G. Hill, and R. Thompson. 1992. Genetic parameters of test day records of british holstein-friesian heifers. Anim. Prod. 55:11–21. 20 Piccolo, D. 1990. Introduzione all’analisi delle serie storiche (Introduction to time series analysis). La Nuova Italia Scientifica, Rome, Italy. 21 Portolano, B., F. Spatafora, G. Bono, S. Margiotta, M. Todaro, V. Ortoleva, and G. Leto. 1996. Application of the Wood model to lactation curves of Comisana sheep. Small Rum. Res. 24:7–13. 22 Portolano, B., M. Todaro, F. Spatafora, G. Bono, S. Margiotta, P. Giaccone, and V. Ortoleva. 1996. Confronto fra due differenti modelli della curva di lattazione in pecore da latte di razza Comisana (Comparison between two different lactation curve models in dairy Comisana ewes). Zootec. Nutr. Anim. 22:323–331.
1103
23 Ptak, E., and L. R. Schaeffer. 1993. Use of test day yields for genetic evaluations of dairy sires and cows. Livest. Prod. Sci. 34:23–34. 24 Rook, A., J. France, and M. S. Danhoa. 1993. On the mathematical description of lactation curves. J. Agric. Sci. Camb. 121:97. 25 Stanton, T. L., L. R. Jones, R. W. Everett, and S. D. Kachman. 1992. Estimating milk, fat, and protein lactation curves with a test day model. J. Dairy Sci. 75:1691–1700. 26 Van der Werf, J., and L. R. Schaeffer. Random regression in animal breeding. Course notes. CGIL Guelph, June 25–June 28, 1997. University of Guelph, Ont. 27 Vargas, B., E. Perez, and J.A.M. Van Arendonk. 1998. Analysis of test day yield data of Costa Rican dairy cattle. J. Dairy Sci. 81:255–263. 28 Wade, K. M., and R. Lacroix. 1994. The role of artificial neural networks in animal breeding. Proc. 5th World Congr. Genet. Appl. Livest. Prod. 22:31–34. 29 Wade, K. M., R. L. Quaas, and L. D. Van Vleck. Estimation of the parameters involved in a first order autoregressive process for contemporary groups. J. Dairy Sci.76:3033–3040. 30 Wilmink, J.B.M. 1987. Comparison of different methods of predicting 305-day milk yield using means calculated from withinherd lactation curves. Livest. Prod. Sci. 17:1–17. 31 Wood, P.D.P. 1967. Algebraic model of the lactation curve in cattle. Nature (Lond.) 216:164–165.
Journal of Dairy Science Vol. 83, No. 5, 2000