An integrated framework for software to provide yield data cleaning ...

Precision Agric DOI 10.1007/s11119-012-9300-7

An integrated framework for software to provide yield data cleaning and estimation of an opportunity index for site-specific crop management Wei Sun • Brett Whelan • Alex B. McBratney • Budiman Minasny

Ó Springer Science+Business Media New York 2012

Abstract This paper proposes an integrated framework for software that provides yield data cleaning and yield opportunity index (Yi) calculation for site-specific crop management (SSCM). The artifacts in many yield data sets, which inevitably occur, can pose a significant effect on the validity of Yi. Automated and standardised yield correction procedures were designed to improve the data quality by removing: (1) unreasonable outliers; (2) distribution outliers (globally and locally); and (3) position errors. The calculation of Yi uses two aspects of crop yield assessment, the magnitude of yield variation and the spatial structure of the variation. The cleaning algorithms were applied to four yield data sets with known integrity issues to demonstrate effectiveness. Approximately 13–20 % of the original yield data were removed, and this resulted in an increased mean yield of 0.13 t/ha (average). The semivariograms of cleaned data were shown to possess smaller nugget values compared with the original data. The opportunity index calculation algorithm was demonstrated on a field with nine seasons of yield data. The results demonstrated that using a ranking of Yi provides a rational, agronomic assessment of the opportunity for SSCM based on the quantity and pattern of production variability displayed in yield data sets. This provides farm managers with a rapid way to assess whether the observed variability deserves further investigation and eventual investment in SSCM operations. Keywords Precision agriculture Yield variability Spatial variation Yield maps Yield data trimming

W. Sun (&) State Key Laboratory of Resources and Environmental Information System, Institute of Geographic Sciences and Natural Resources Research, Chinese Academy of Sciences, Beijing 100101, China e-mail: [email protected] B. Whelan A. B. McBratney B. Minasny Precision Agriculture Laboratory, Faculty of Agriculture and Environment, The University of Sydney, Sydney, NSW 2015, Australia

123

Precision Agric

Introduction Site-specific crop management (SSCM), primarily introduced by Robert (1989), is an agronomic practice with the intent of implementing field management that responds to the existence of in-field variability. The shift to SSCM from traditional uniform management is occurring because it offers the potential to increase the efficient use of farm inputs. However, the adoption of SSCM without sound agronomic assessment may lead to lower profitability and poor environmental outcomes (Whelan and McBratney 2000). The lack of a simple decision system with which initial assessments can be made has somewhat limited the development of SSCM. In Australian agriculture, the development of decision support systems (DSS) has not progressed as well as the experts would have thought (Nguyen et al. 2006). Most farmers may not appreciate how using a DSS can help, because they see the variability in seasons, soils and crops contributing to a complexity that they believe is beyond the capabilities of the available programs, and their ability to operate them. In many cases, rules of thumb and decision trees may actually work better than computer-based DSS (Nguyen et al. 2006). In this regard, the most acceptable DSS for early decisions in SSCM may be one that provides a simple approach to assess the extent of production variability within a field or a farm. To tackle this issue, Pringle et al. (2003) proposed an approach to assess the opportunity for SSCM based on the magnitude of yield variation and the spatial structure of yield variation. The aim being to find a threshold for judging whether fields are suitable candidates for further expenditure of time and money on SSCM. The opportunity index (Oi) as proposed by Pringle et al. (2003) was tested by de Oliveira et al. (2007) over a large number of fields in Australia, and it was found that the Oi was inappropriate when dealing with frequent instability in yield variability. Yieldex (Yi) was proposed (de Oliveira et al. 2007) as an improvement on the calculations of (Oi) by providing a more flexible and robust method to address both stationary and non-stationary spatial yield distribution. However the issue remains that while farmers have been building a library of historical data on within-field yield variability, such decision tools remain in the research realm and a practical implementation is not available. Adding to the implementation restriction is the fact that historical yield data from yield monitors inevitably contain some artifacts (Beck et al. 2001) due to georeferencing errors, sensor errors and operation errors. These errors can affect the intrinsic distribution of yield variation (Sudduth and Drummond 2007; Simbahan et al. 2004), and therefore lead to potentially inappropriate management decisions from raw yield maps (Griffin et al. 2008). This work aims to provide a solution to the two issues by designing and implementing a software solution for cleaning raw yield monitor data and then applying the opportunity index (Yi) to provide an assessment tool for further consideration of SSCM. The development of the framework and its implementation in preliminary software in this paper is aimed at computing and ranking the opportunity for further exploration and adoption of SSCM strategies. The authors acknowledge that there may be occurrences where the use of variability in historical yield data may not completely identify the full variation in production variability and therefore the opportunity for SSCM. An example would be a field where two different and spatially segregated existing soil amelioration issues produce a similar yield response. However, using the opportunity ranking with the decision rules created by farmers and their local agronomic advisors should help decide where to initially target resources for the more obvious applications of SSCM and help map out a wholefarm strategy for further investigations.

123

Precision Agric

Methods and materials Yield data processing methods Yield calculation Harvester-mounted yield monitoring systems are widely commercially available but the different brands often output data in different formats (e.g. different data columns and different units). In this software, the yield is calculated as a mass per unit area and requires the mass flow, distance travelled, harvest width and grain moisture to complete the yield calculation. A number of user-selectable formulas are included to accommodate differences in input data. Spatial coordinate conversion Yield monitor data sets record locations using geographical coordinates (latitude and longitude) gathered from a Global Navigation Satellite System (GNSS) receiver, but in order to apply mathematical analyses based on distance, the positions need to be converted to Cartesian coordinates. An algorithm was integrated within the software to convert latitude and longitude to Universal Transverse Mercator projection (UTM) according to the formula described in the US army technical manual (1973). Removal of unreasonable outliers Crop yield (mass/area) can be expected to fall within a range of realistic values for each crop. The upper and lower limits of this range can be set within the software to remove the obvious unreasonable outliers. The values of (0.1 and 10 t/ha) were used in this study. Removal of distribution outliers In distribution statistics, the three-sigma rule states that for a normal distribution, 99 % of values lie within 3 standard deviations of the mean (Pukelsheim 1994). In this software, the mean and standard deviation (SD) are determined for the whole data set, and calculated yield values that fall outside of the mean ± 2.5 SD are considered as outliers by default. Taylor et al. (2007) also used this algorithm to clean data sets based on calculated crop yield. The SD coefficient for the upper and lower limits can however be set by the user. This aspect is necessary because there are occasions where yield monitor data are negatively skewed as the result of harvesting artifacts such as those caused by harvesting with an incompletely-filled header or travelling/turning over harvested areas with the header down (Taylor et al. 2007). A reduction of the lower limit coefficient may be warranted in such cases (e.g. mean-1.5 SD). Applying this procedure to a data set as a whole removed the extremes from the data set, but did not deal with local extremes within the field (spatial outliers). A local application of the distribution identification procedure was also included whereby a local neighborhood with a search radius of 25 m was identified at the nodes of a 5-m grid across the field. The 25 m local neighborhood allowed four harvester swaths to be included using the majority of harvesting setups and the 5-m grid was chosen as a tradeoff between ensuring the neighborhoods overlap and keeping the number of calculations (and therefore total

123

Precision Agric

calculation time) down across large fields. The detection criterion was the same as the one used in the global detection step. In the local filtering process, the outliers were marked for deletion in each moving window, but were then only deleted when the whole local checking process was completed. Removal of position errors Position allocation errors are inevitable in yield data sets, although the improving accuracy of GNSS receivers and the use of controlled-traffic farming are reducing the incidence. Complicated topography, atmospheric interference and other operational errors still appear in position information within yield data sets. Blackmore and Moore (1999) identified two types of position errors: those that affect the whole data set, which obviously lie outside the field and are easy to correct, and those where allocated positions depart from regular harvest tracks which are difficult to determine through an algorithm automatically. Data points that fall within the second category will cause problems when incorporated into spatial analyses of yield variation that rely on using the separation distance between samples. To identify these data points it is possible to begin by identifying each harvest run using the ‘harvest pass’ column recorded by the yield monitors. The harvest pass number changes when the header is raised at the end of a harvest run. However, identification of each row by this method is problematic in continuously harvested operations and where the header is raised and lowered at times other than the end of harvest runs. In order to identify each run more effectively, another approach was designed and used in conjunction with the ‘harvest pass’ identifier. In an individual pass, at least one of the positional coordinates (x,y) should keep incrementally increasing or decreasing (depending on which direction the harvester is travelling). An algorithm was employed to check each identified ‘harvest pass’ and essentially calculate the bearing of the pass. At present the system is designed to flag a new pass if the bearing changes by more than 90°. Using the two methods together improves the identification of separate harvest passes as compared to applying each singularly. With the harvest passes correctly identified, the position errors were checked for two aspects: distance between locations within a pass and between adjacent passes. Within a pass, the distance (d) between two adjacent points pi and pj was calculated, and compared with a user defined minimum Dmin as per Eq. (1). dðpi ; pj Þ\Dmin

ð1Þ

where pj is the point observed later in time and is the point that is deleted when this algorithm returns a value less than Dmin. Between adjacent passes, the distance (d) was calculated between two points pr,i and pr?1,i which were located in different passes (r), including the set l e (1,5) so that distances between all points in a pass were checked against points in the identified next five successive passes The order of succession is presently determined by the time stamp in the data file. Up to five successive passes were used to ensure that extreme positional errors and erroneous harvester paths were identified. D0 min is the user-defined minimum sampling distance between two rows and is used as per Eq. (2). dðpr;i ; prþ1;i Þ\D0min

ð2Þ

where pr?1,i is the point observed in the later pass and is the point that is deleted when this algorithm returns a value less than D0 min.

123

Precision Agric

Opportunity Index Pringle et al. (2003) first proposed an Opportunity Index for SSCM (Oi) as in Eq. (3) pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi Oi ¼ M D E ð3Þ where M is the magnitude of variation relative to a certain area threshold; D is the spatial structure of variation relative to a defined management-responsive area; and E is the economic-environmental benefit of SSCM relative to uniform management. As little is known about the nature parameter E at present, it has been assumed constant (=1). de Oliveira (2009) applied the Oi to a large number of yield data sets and discovered some issues that related to the calculation of component D. The calculations were modified and the index renamed Yieldex (Yi). The equations are described as follows (de Oliveira 2009): pffiffiffiffiffiffiffiffiffiffiffiffi ð4Þ Yi ¼ MV SV MV is the magnitude of variation; SV is the spatial structure of variation. The magnitude of variation (MV) is calculated in an algorithm comprising the Eqs. (5–8) sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi CVA MV ¼ ð5Þ q50 ½CVA where CVA is an areal coefficient of variation calculated for the field in question and q50[CVA] is the median CVA calculated from all our available field-year samples (not limited to data in this paper). The q50[CVA] is used here as an estimate of the minimum CVA needed to consider differential management practices. This value could be expected to change or be changed for different circumstances as knowledge of the range in critical CVA values increases. pffiffiffiffiffiffi AC CVA ¼ 100 ð6Þ X where AC is the variance of the whole field, X is the mean yield of the field. " , # ! n X n X 1 2 2 C0 ðXi Xj Þ AC ¼ n i¼1 j¼1

ð7Þ

where Xi and Xj are spatially separated yield observations within a field and C0 is the nugget effect of a semivariance model of the whole field yield data. The C0 parameter represents the random uncorrelated variation which can be a combination of natural shortscale variability and measurement or machinery operational error, while n is the number of observations. The exponential model (Eq. 8) is used to model the semivariance. h ð8Þ cðhÞ ¼ C0 þ C1 1 exp a1 where C0 is as above and C1 is the structural or spatially dependent semivariance. TheC0 ? C1 component is known as the plateau or sill semivariance and the distance (h) at which this point is reached is called the practical range of spatial dependence (r). The exponential model rises asymptotically towards the sill so it theoretically possesses no absolute range of spatial dependence. However the semivariance does not effectively

123

Precision Agric

increase beyond a distance which approximates (r) and is calculated by Eq. 9. At this distance, the semivariance is defined by Eq. 10 (Webster 1985). r ¼ a1 2:966

ð9Þ

cðrÞ ¼ C0 þ 0:95 C1 :

ð10Þ

The structure of variation (SV) is calculated as: CD SV ¼ pffiffiffiffiffiffi OL

ð11Þ

where CD is the maximum correlated distance in the data and is calculated as either: i.

The practical range of spatial dependence (r) when the value of (r) calculated from the fitted semivariogram model is less than the field maximum distance; or ii. The distance (h) derived from Eq. 12 when the practical range of the fitted semivariogram model is larger than the field maximum distance cðhÞ ¼ C0 þ 0:95 ðcðfield max distanceÞ C0 Þ:

ð12Þ

Bounding the correlation distance within the maximum field length ensures that SV has a value that relates to the field size and empirical evidence showed that this resulted in a more satisfactory overall Oi ranking for field years in single fields. OL is the operational area, which is defined as the ability of variable-rate machinery to react. OL ¼

bms 10000

ð13Þ

where b is the swath (m), m is the speed (m/s), and s is the time to alter application(s). The structure of the software The software was developed in Visual C?? 6.0. The program flow chart is shown in Fig. 1. A text file with information including longitude, latitude, grain mass flow, sample interval, distance travelled, cutting width, moisture content and pass number is required for the first step of yield calculation and trimming. The cleaned output file can be read directly by the second step involving opportunity assessment. The empirical variogram is calculated using the standard method of moments. The Levenberg–Marquardt nonlinear leastsquares algorithm (Lourakis 2004) was used to fit the empirical variogram. A user-friendly interface is provided to guide the user through the analysis and enable alterations to the criteria used in the cleaning procedures as necessary (Fig. 2). Study sites The study fields are located in the states of New South Wales and South Australia, Australia. Descriptive details of these fields are given in Table 1. The yield data for the first four fields was expressly chosen from the early days of yield monitoring, where more errors are likely, to demonstrate the performance of the trimming algorithm. The distribution statistics were calculated and the difference between the spatial structure of raw yield data and cleaned yield data were then estimated. To highlight the ranking capabilities of the opportunity function (Yi) on cleaned yield data, a field with spatial data covering 9 successive seasons was used.

123

Precision Agric

Raw yield data (.txt file)

2.Coordinate conversion

3.Removal of limited yield outliers

6. Remove of outliers through distributionof yield (locally)

4. Remove of outliers through distributionof yield (globally)

Trim yield data

5. Removal of position errors (according to distance between lines and distance between measurements in a line)

1.Yield calculation

Cleaned data (.txt file)

Yield Empirical Variogram (Maximum distance, Lag distances and semi-variances)

Average Yield Variance (AC)

Max. Correlated Distance (CD)

Operational Length (OL)

Spatial Structure of Variation (SV)

Median CVA (q50[CVA])

Opportunity Index

Areal Coefficient of Variation (CVA)

Beat Model Fit (C0, C1, Range)

Magnitude of Variation (MV)

Yield Opportunity Index Y M S Fig. 1 Flow diagram for the software, which summarises how yield data were cleaned and the calculation of the site-specific crop management (SSCM) opportunity index

Results and discussion Yield data cleaning The benefit of checking relative positions across a number of harvest passes is shown in Fig. 3. The pass labeled ‘8’ overlaps the passes labeled ‘3’,‘4’,‘5’,‘6’, but if pass ‘8’ were only checked relative to the immediately preceding pass ‘7’, then only the northern section of pass ‘8’ would be deemed incorrect. However, when comparing the relative positions

123

Precision Agric

Fig. 2 The intuitive graphic user interface for the software

Table 1 Summary statistics for raw yield data listed by field Field

Location

Crop

Years

Area (ha)

Mean yield (t/ha)

SD yield (t/ha)

CV (%)

Glens

NSW

Faba beans

2001

Comet

NSW

Wheat

1997

52

2.56

1.21

47.2

90

3.82

1.27

Comet

NSW

Sorghum

33.2

1998

90

5.55

0.94

16.9

BT

NSW

Road

SA

Sorghum

1998

135

4.65

1.37

29.4

Wheat

1999

112

1.05

0.23

Road

21.9

SA

Barley

2000

112

3.45

0.39

11.3

Road

SA

Lentils

2001

112

1.76

0.54

30.7

Road

SA

Wheat

2002

112

0.87

0.69

79.3

Road

SA

Wheat

2003

112

2.14

0.36

16.8

Road

SA

Peas

2004

112

1.02

0.14

13.7

Road

SA

Wheat

2005

112

2.50

0.25

10.0

Road

SA

Faba beans

2007

112

0.79

0.12

15.2

Road

SA

Wheat

2008

112

2.04

0.28

13.7

123

Precision Agric

across 5 successive passes it was obvious that it was mislocated and should not be included in further analysis. Table 2 shows the yield distribution statistics following the application of the full cleaning procedure to the first four fields in Table 1. In each data set, the mean yield increased following cleaning, with the increases being 4.7, 2.3, 1.4 and 5.1 %, respectively. The SD of the yield distribution decreased in all fields, resulting in decreases in the coefficient of variation (CV) between 2.3 and 12.1 %. The percentage of data points removed from the raw yield files ranged from 13.1 to 19.6 %. Figure 4 presents the location of the points deleted in each field. From Fig. 4a, it is clear that overlapping points and erroneous harvest tracks were removed. Field 4a and 4c were harvested more regularly, with continuous passes generally across the width of the field, so the majority of points identified for deletion occurred near the boundary (turning points) in the fields. Field 4b and 4d were harvested more irregularly, with harvester passes truncating at various locations in the field and, as a result, the points identified for deletion are dispersed across the fields. The quantity of erroneous points identified in these fields demonstrates that there can be numerous artifacts existing in crop yield data collected on-harvester, and these are likely to influence the spatial structure of a data set. The potential extent of this effect can be explored using variography. The semivariograms and their parameters which describe the spatial structure of raw and cleaned yield data are shown in Fig. 5 and Table 3. Nugget variance (C0) representing measurement error and/or random variation were significantly

0

12.5

25

50

75

100 Meters

Fig. 3 Locations of mislocated raw yield data caused by failure of the GNSS positioning receiver

123

Precision Agric Table 2 Summary statistics for yield data in four fields following each stage of the cleaning algorithm as outlined in Fig. 1 Step

Mean (t/ha)

Median (t/ha)

Min. (t/ha)

1

2.56

2.38

0

3

2.58

2.39

0.003

4

2.64

2.42

5

2.66

2.44

6

2.68

2.45

Max. (t/ha)

SD

CV (%)

Removed No.

9.35

1.21

47.2

–

9.35

1.18

45.7

326

0.806

5.55

1.00

37.8

3 017

0.806

5.55

0.99

37.2

1 211

0.824

5.55

0.94

35.1

2 741 –

(a) Glens 2001 (19.6 % deleted)

(b) Comet 1997 (19.6 % deleted) 1

3.82

4.06

0.35

7.01

1.27

33.2

3

3.82

4.06

0.35

7.01

1.27

33.2

0

4

3.84

4.07

0.65

6.94

1.25

32.5

289

5

3.85

4.08

0.65

6.94

1.24

32.2

6 026

6

3.91

4.14

0.66

6.67

1.21

30.9

3 921 –

(c) Comet 1998 (13.1 % deleted) 1

5.55

5.61

1.71

9.11

0.94

16.9

3

5.55

5.61

1.71

9.11

0.94

16.9

0

4

5.59

5.63

3.19

7.91

0.86

15.4

1 037

5

5.59

5.62

3.19

7.91

0.85

15.2

2 073

6

5.63

5.66

3.19

7.91

0.80

14.2

4 060 –

(d) BT 1998 (15.4 % deleted) 1

4.65

4.84

0.64

8.52

1.37

29.4

3

4.65

4.84

0.64

8.52

1.37

29.4

0

4

4.71

4.87

1.21

8.52

1.28

27.2

1 666

5

4.74

4.89

1.21

8.09

1.27

26.8

3 690

6

4.89

4.97

1.21

8.09

1.13

23.1

7 402

reduced in three of the four fields. The correlation range (a1) was also reduced. It is obvious from Fig. 4 that the general spatial relationships over the lag distance remained consistent between raw and cleaned data sets, with the major impact of applying the cleaning algorithm being a downward shift in the total semivariance at each lag. The changes to the variograms were consistent with previous studies on yield data cleaning operations (Simbahan et al. 2004; Sudduth and Drummond 2007) and these results will exert an influence on the calculation of the opportunity index (Yi) because C0 and a1 are important parameters in the calculation. Applying a valid cleaning process to raw yield monitor data was therefore a necessary step to be undertaken prior to using this data in assessing the opportunity for SSCM. The algorithm described in this study aims to fulfill this requirement, but it is acknowledged that further improvements could be made to individual routines. In particular, the routine that identifies each pass based on monitoring the increase or decrease in position coordinates has its limitations. At present, the first pass is assumed to be collected regularly and the rest of the data set passes assessed relative to that base. Specifically calculating a rolling estimate of the harvester bearing and comparing this to a threshold value may be beneficial. Providing user control of the bearing threshold (e.g. 45 or 135°) to suit different data sets may aid in identifying each pass more accurately.

123

Precision Agric

Removed data

Cleaned data

Raw data

(a)

(b)

(c)

(d)

0

150

300

600

900

1,200 Meter

Fig. 4 Maps showing location of raw, cleaned and removed data points for 4 seasons of yield data with known integrity issues. a Glens 2001; b Comet 1997; c Comet 1998; d BT 1998

Identifying the pass succession order using time can also raise issues if harvesting is not performed with spatially successive passes. The temporal succession can be quickly identified, but in skipping harvest paths or harvesting in spatially separated blocks, the process as applied may fail to identify some spatially erroneous points. With the increased

123

Precision Agric raw data

Semivariance

2.5

cleaned data

Glens 2001

2 1.5 1 0.5 0

0

100

200

300

400

500

600

700

800

Lag distance (m)

Semivariance

5

raw data

4

cleaned data

Come t1997

3 2 1 0

0

200

400

600

800

1000

1200

1000

1200

Lag distance (m) raw data

Semivariance

1.2

cleaned data

Comet 1998

1 0.8 0.6 0.4 0.2 0

0

200

400

600

800

Lag distance (m)

Semivariance

3.5

raw data

3

cleaned data

BT 1998

2.5 2 1.5 1 0.5 0

0

200

400

600

800

1000

1200

1400

Lag distance (m) Fig. 5 Empirical semivariograms of raw data and cleaned data for four yield data sets gathered from 1997 to 2001

use of GNSS vehicle navigation aids and autosteer, this issue will continue to reduce in importance. The processing of ‘old’ yield data files may require some visual inspection of the spatial array of the final cleaned data prior to mapping or other analysis. In addition, a more sophisticated method that repositions the data associated with identified positional errors instead of deleting them could be considered given that the

123

Precision Agric Table 3 The semivariogram parameters describing the spatial structure of four fields prior to, and following, application of the cleaning algorithm Field

C0

C1

Raw

0.62

2.30

777

Cleaned

0.15

1.61

557

Raw

0.15

5.01

1 181

Cleaned

0

3.59

610

Raw

0.19

0.69

75

Cleaned

0.03

0.58

67

Raw

0.98

3.00

1 565

Cleaned

0.37

2.34

1 116

A1

Glens 2001

Comet 1997

Comet 1998

BT 1998

distance between adjacent passes in each data set should remain constant within the error bounds of a user’s GNSS. Opportunity index (Yi) Crop yield maps constructed using local block kriging (Minasny et al. 2002) for nine seasons in a single field are shown in Fig. 6. They are displayed from highest to lowest opportunity as ranked by Yi, and the parameters of the respective semivariogram models and other components for the computation of Yi are detailed in Table 4. This ranking fits a general agronomic interpretation of the data in the crop yield maps. The top five ranked seasons show the largest within season differences in yield across the field and/or the most coherent trends or patches of variability (i.e. less random variability). These attributes have been captured by the magnitude (M) and structural (S) components of the opportunity index (Yi). The index gives equal weight to the two components in the calculation of opportunity. The highest ranked season (2002) has by far the largest magnitude component but a lower structural component than the rest of the top five rankings. The structural component was maximised in the 2007 season where there was an obvious south-west to north-east increasing trend. Having greater variation between the higher and lower yielding areas, and/or easily partitioned variation, makes it easier to investigate the causes of variability within a field and if necessary manage inputs with variable-rate application equipment. Both of these attributes should make for greater SSCM opportunity. A decision tree for the extent of SSCM adoption based on Yi has been proposed by de Oliveira (2009). Decisions are supported according to the amount and type of data available for use in the opportunity calculation. At the basic level, a comparison of opportunity indices between fields using single-season yield variation can help initial adopters understand their fields. A high index would suggest that a field is more worthy of further investigation than a lower index. If three or more seasons of yield data are available for each field, a median opportunity index could be computed and used by initial adopters to direct local within-field management options. At a more advanced level, with numerous seasons of yield data available, and data from other platforms such as soil ECa or in-season

123

Precision Agric

Yi=10.5

Yi=10.8

Yi=9.80

Yi=9.31

Yi=8.62

100

Yi=8.49

2008 Yield (t

)

200

400

2000 Yield (t

)

Yi=6.1

2005 Yield (t

600

)

Yi=8.96

2003 Yield (t

)

1999 Yield (t

(

2007 Yield (t

0

2001 Yield (t

)

(

2002 Yield (t

Yi=10.5

)

2004 Yield (t

)

800 Meters

Fig. 6 Nine seasons of yield maps for a single displayed in order of SSCM opportunity as ranked by the opportunity function Yi

123

Precision Agric Table 4 Calculated and ranked opportunity index (Yi) for nine seasons of yield data in a single field. Parameters required for the calculations are also shown Years

Mean yield

C0

C1

2002

0.87

0.22

0.33

294

2001

1.77

0.22

0.17

891

1999

1.06

0.03

0.08

2 132

2007

0.8

0.01

0.03

2003

2.15

0.09

0.09

2000

3.46

0.09

2008

2.04

2005

2.5

2004

1.02

Median

A1

Cd

CVa

q50

Mv

Sv

Yi

882

59.3

16.9

2.49

46.4

10.8

1 613

16.1

16.9

1.3

84.9

10.5

1 751

13.5

16.9

1.19

92.2

10.5

1 993

1 734

10.6

16.9

1.05

91.2

963

1 647

9.55

16.9

1

86.7

9.31

0.22

1 569

1 718

7.53

16.9

0.89

90.4

8.96

0.06

0.04

805

1 586

7.57

16.9

0.89

83.5

8.62

0.02

0.07

674

1 509

7.88

16.9

0.91

79.4

8.49

0.01

0.01

259

777

8.26

16.9

0.93

40.9

6.17

9.80

9.31

crop imagery, a combined index, as well as crop specific indices, could be calculated to aid advanced operators to develop decision-making rules at different levels of the farming operation (e.g. crop management per field, field management per farm). The integrated cleaning and opportunity algorithm presented in this paper not only works for within- and between-field evaluation, but also on a between farm scale. If opportunity information can be built up for a region, it may be possible to help improve the efficiency of resource use on a broader operational scale where control of inputs may be practiced (e.g. government allocation of input resources within regions).

Conclusions This paper proposes the integration of yield data cleaning and opportunity index calculation in a single software framework. Raw yield data can be fed into the program, the data automatically cleaned with preset/user-controlled values and the opportunity index calculated. The data can then be ranked based on its opportunity as influenced by the magnitude and the structure of the observed spatial variation. The results of the index ranking are shown to provide an agronomically sensible outcome for a field with nine cropping seasons. The software developed for this process should assist farmers or advisors to make further SSCM investment decisions, at a range of detail levels, in a robust and convenient way. Acknowledgments Thanks to the Chinese Scholarship Council for providing funds for the research. The program will be made available from the website of the Australian Centre for Precision Agriculture (ACPA), University of Sydney. We acknowledge the support of the Australian Grains Research and Development Corporation (GRDC) and the Southern Precision Agriculture Association (SPAA).

References Beck, A. D., Searcy, S. W., & Roades, J. P. (2001). Yield data filtering techniques for improved map accuracy. Applied Engineering in Agriculture, 17, 423–431. Blackmore, B. S., & Moore, M. (1999). Remedial correction of yield map data. Precision Agriculture, 1, 53–66.

123

Precision Agric de Oliveira, R. P. (2009). Contributions towards decision support for site-specific crop management. PhD Disseration. The University of Sydney, July, 2009. de Oliveira, R. P., Whelan, B. M., McBratney, A. B., & Taylor, J. A. (2007). Yield variability as an index supporting management decisions: YIELDex. In J. V. Stafford (Ed.), Precision Agriculture’ 07: Proceedings of the 6th European Conference on precision agriculture (pp. 281–288). Wageningen, The Netherlands: Wageningen Academic Publishers. Griffin, T. W., Dobbins, C. L., Vyn, T. J., Florax, R. J. G. M., & Lowenberg-DeBoer, J. M. (2008). Spatial analysis of yield monitor data: case studies of on-farm trials and farm management decision making. Precision Agriculture, 9, 269–283. Lourakis, M. I. A. (2004). Levmar: Levenberg-Marquardt nonlinear least squares algorithms in {C}/{C}??. http://www.ics.forth.gr/*lourakis/levmar. Retrieved Dec 28, 2009. Minasny, B., McBratney, A.B., & Whelan, B.M. (2002). Vesper. Australian Centre for Precision Agriculture. University of Sydney, NSW 2006. Nguyen, N., Wegener, M., & Russell, I. (2006). Decision support systems in Australian agriculture: state of the art and future development. Australasian Farm Business Management Journal, 4, 14–21. Pringle, M. J., McBratney, A. B., Whelan, B. M., & Taylor, J. A. (2003). A preliminary approach to assessing the opportunity for site-specific crop management in a field, using yield sensor data. Agricultural Systems, 76, 273–292. Pukelsheim, F. (1994). The three sigma rule. The American Statistician, 48, 88–91. Robert, P. C. (1989). Land evaluation at farm level using soil survey information system. In J. Bouma & A. K. Bregt (Eds.), Land qualities in space and time (pp. 299–311). Wageningen: Pudoc. Simbahan, G. C., Dobermann, A., & Ping, J. L. (2004). Site-specific management: Screening yield monitor data improves grain yield maps. Agronomy Journal, 96, 1091–1102. Sudduth, K. A., & Drummond, T. S. (2007). Yield Editor: Software for removing errors from crop yield maps. Agronomy Journal, 99, 1471–1482. Taylor, J. A., McBratney, A. B., & Whelan, B. M. (2007). Establishing management classes for broadacre agricultural production. Agronomy Journal, 99, 1366–1376. Webster, R. (1985). Quantitative spatial analysis of soil in the field. Advances in Soil Science, 3, 1–70. Whelan, B. M., & McBratney, A. B. (2000). The null hypothesis of precision agriculture management. Precision Agriculture, 2, 265–279.

123

An integrated framework for software to provide yield data cleaning ...

An integrated framework for software to provide yield data cleaning ...

Suggest Documents

An Extensible Framework for Data Cleaning

OpenStructure: an integrated software framework for ... - BioMedSearch

Potter's Wheel: An Interactive Framework for Data Cleaning and ...

The LLUNATIC Data-Cleaning Framework

OpenStructure: an integrated software framework for ... - IUCr Journals

FMetâan integrated framework for Meteosat data processing for ...

Data Cleaning: A Framework for Robust Data Quality ...

data cleaning techniques for software engineering data sets ... - Core

data cleaning techniques for software engineering data ... - CiteSeerX

Cleaning Integrated Data: Challenges and ... - Semantic Scholar

An Integrated Data-Driven Framework for Computing System ...

A Data Fusion Framework for an Integrated Plant ... - Semantic Scholar

An integrated framework for geospatial data discovering ... - IEEE Xplore

WebTool: An Integrated Framework for Data Mining - CiteSeerX

An Integrated Framework for Enabling Effective Data Collection and ...

WebTool: An Integrated Framework for Data Mining - CiteSeerX

Data Cleaning for an Intelligent Greenhouse - BME

An Integrated Framework to Evaluate Resilient

AN INTEGRATED FRAMEWORK TO SUPPORT DISTRIBUTED CAD

TOWARDS AN INTEGRATED ANALYTICAL FRAMEWORK TO MAP ...

Toward An Integrated Framework of Software Project Threats - Data61

SbSAD: An Integrated Service-based Software Design Framework

Web Usage mining framework for Data Cleaning and IP address ...

An Integrated Systems Architecture to Provide Maritime Domain ...

An integrated framework for software to provide yield data cleaning ...