Integrated Bayesian Network frameworks for modelling ... - QUT ePrints

______________________________________________________________

School of Mathematical Sciences Queensland University of Technology

Integrated Bayesian Network frameworks for modelling complex ecological issues

Sandra Johnson BSc (University of Natal)

Principal Supervisor: Associate Supervisor:

Prof Kerrie Mengersen Prof Tony Pettitt

A thesis submitted for the degree of

Doctor of Philosophy in the Faculty of Science and Technology, Queensland University of Technology according to QUT requirements

2009 ______________________________________________________________

KEYWORDS

Acinonyx jubatus, algae, algal bloom, Bayesian network, BN, Botswana, cheetah, conservation, cheetah relocation, DOOBN, dynamic network, freeranging cheetah population, integrated network, IBNDC, integrated Bayesian network development cycle, Lyngbya majuscula, model integration, Namibia, network validation, object oriented, OOBN, population viability, relocation, South Africa, viable population

2

ABSTRACT

Ecological problems are typically multi faceted and need to be addressed from a scientific and a management perspective. There is a wealth of modelling and simulation software available, each designed to address a particular aspect of the issue of concern. Choosing the appropriate tool, making sense of the disparate outputs, and taking decisions when little or no empirical data is available, are everyday challenges facing the ecologist and environmental manager. Bayesian Networks provide a statistical modelling framework that enables analysis and integration of information in its own right as well as integration of a variety of models addressing different aspects of a common overall problem. There has been increased interest in the use of BNs to model environmental systems and issues of concern. However, the development of more sophisticated BNs, utilising dynamic and object oriented (OO) features, is still at the frontier of ecological research. Such features are particularly appealing in an ecological context, since the underlying facts are often spatial and temporal in nature. This thesis focuses on an integrated BN approach which facilitates OO modelling. Our research devises a new heuristic method, the Iterative Bayesian Network Development Cycle (IBNDC), for the development of BN models within a multi-field and multi-expert context. Expert elicitation is a popular method used to quantify BNs when data is sparse, but expert knowledge is abundant. The resulting BNs need to be substantiated and validated taking this uncertainty into account. Our research demonstrates the application of the IBNDC approach to support these aspects of BN modelling. The complex nature of environmental issues makes them ideal case studies for the proposed integrated approach to modelling. Moreover, they lend themselves to a series of integrated sub-networks describing different scientific components, combining scientific and management perspectives, or pooling similar contributions developed in different locations by different research groups. In southern Africa the two largest free-ranging cheetah (Acinonyx jubatus) populations are in Namibia and Botswana, where the majority of cheetahs are located outside protected areas. Consequently, cheetah conservation in these two countries is focussed primarily on the free-ranging populations as well as the mitigation of conflict between humans and cheetahs. In contrast, in neighbouring South Africa, the majority of cheetahs are found in fenced reserves. Nonetheless, conflict between humans and cheetahs remains an 3

issue here. Conservation effort in South Africa is also focussed on managing the geographically isolated cheetah populations as one large meta-population. Relocation is one option among a suite of tools used to resolve humancheetah conflict in southern Africa. Successfully relocating captured problem cheetahs, and maintaining a viable free-ranging cheetah population, are two environmental issues in cheetah conservation forming the first case study in this thesis. The second case study involves the initiation of blooms of Lyngbya majuscula, a blue-green algae, in Deception Bay, Australia. L. majuscula is a toxic algal bloom which has severe health, ecological and economic impacts on the community located in the vicinity of this algal bloom. Deception Bay is an important tourist destination with its proximity to Brisbane, Australia’s third largest city. Lyngbya is one of several algae considered to be a Harmful Algal Bloom (HAB). This group of algae includes other widespread blooms such as red tides. The occurrence of Lyngbya blooms is not a local phenomenon, but blooms of this toxic weed occur in coastal waters worldwide. With the increase in frequency and extent of these HAB blooms, it is important to gain a better understanding of the underlying factors contributing to the initiation and sustenance of these blooms. This knowledge will contribute to better management practices and the identification of those management actions which could prevent or diminish the severity of these blooms.

4

Contents Introduction .................................................................................................... 13 Research Aims and Questions ................................................................... 13 Thesis Overview ......................................................................................... 16 References ............................................................................................... 20 Literature Review ........................................................................................... 21 2.1 Bayesian Networks ............................................................................... 22 2.2 Dynamic Bayesian Networks (DBN) ..................................................... 25 2.3 Object Oriented Paradigm .................................................................... 27 2.4 Object Oriented Bayesian Networks (OOBN) ....................................... 28 2.5 Dynamic Object Oriented Bayesian Networks (DOOBN) ..................... 29 2.6 Integrated Bayesian Networks.............................................................. 30 2.7 Bayesian Network Validation and Evaluation ....................................... 30 2.8 Review of BN Modelling ....................................................................... 32 2.9 Population Viability Analysis ................................................................. 33 2.10 Cheetah Conservation ........................................................................ 33 2.11 Harmful Algal Blooms ......................................................................... 34 2.12 References ......................................................................................... 35 An Integrated Bayesian Network Approach to Lyngbya majuscula Bloom Initiation ......................................................................................................... 40 3.1 Introduction ........................................................................................... 42 3.2 Methods 44 3.2.1 Bayesian Network (BN) ................................................................. 44 3.2.2 Object Oriented Bayesian Network (OOBN) .................................. 47 3.2.3 Dynamic Object Oriented Bayesian Network (OOBN) ................... 48 3.2.4 Integrated Bayesian Network (IBN) ............................................... 49 3.3 Results ............................................................................................... 51 3.4 Discussion ............................................................................................ 57 3.5 References ........................................................................................... 59 Integrating Bayesian Networks and a GIS-based Nutrient Hazard map of Lyngbya majuscula ........................................................................................ 62 4.1 Introduction ........................................................................................... 63 4.2 Background .......................................................................................... 63 5

4.3 Methods ............................................................................................... 64 4.4 Results ............................................................................................... 68 4.5 Discussion ............................................................................................ 69 4.6 Conclusions .......................................................................................... 69 4.7 Acknowledgements .............................................................................. 69 4.8 References ........................................................................................... 70 4.9 Appendix to Conference Paper ............................................................ 71 Modelling cheetah relocation success in southern Africa using an Iterative Bayesian Network Development Cycle .......................................................... 75 5.1 Introduction........................................................................................... 77 5.2 Methods ............................................................................................... 79 5.2.1 Study Area ..................................................................................... 79 5.2.2 Iterative Bayesian Network Development Cycle (IBNDC) ............. 80 5.3 Results ............................................................................................... 85 5.3.1 Core Process ................................................................................. 85 5.3.2 Iterative Process ............................................................................ 87 5.4 Discussion ............................................................................................ 92 5.5 References ........................................................................................... 94 5.6 Appendix .............................................................................................. 97 A1 Table 5.3: Description of nodes of Protected Fenced Relocation BN (Fig. 5.6) ................................................................................. 97 A2

CPTs for Protected Fenced Relocation BN (Fig. 5.6, Table 5.3) ... 98

A3

Motivation for nodes in Cheetah Relocation BNs ........................ 101

Target Nodes ........................................................................................ 101 Key Factors (Nodes) ............................................................................. 101 A4

Relocation into protected unfenced areas ................................... 102

A5

Relocation into unprotected unfenced areas ............................... 103

A6

In situ relocation .......................................................................... 104

A7

Ecological suitability .................................................................... 104

A8

Site Factors ................................................................................. 105

Viability of the free-ranging cheetah population in Namibia - an Object Oriented Bayesian Network Approach ......................................................... 107 6.1 Introduction ......................................................................................... 109 6.2 Methods ............................................................................................. 110 6.3 Results ............................................................................................. 113 6.4 Discussion .......................................................................................... 121 6.5 Literature Cited ................................................................................... 122 6.6 Appendix ............................................................................................ 126 6

6.6.1 Background to Human subnetwork.............................................. 126 6.6.2 Background to Ecological subnetwork ......................................... 127 6.6.3 Background to Biological subnetwork .......................................... 128 A Bayesian Network approach to modelling temporal behaviour of Lyngbya majuscula bloom initiation ............................................................................ 130 7.1 Introduction......................................................................................... 133 7.2 Temporal Node Quantification ............................................................ 135 7.2.1 Rainfall ........................................................................................ 135 7.2.2 Number of Previous Dry Days ..................................................... 136 7.2.3 Temperature ................................................................................ 137 7.2.4 Wind Direction ............................................................................. 138 7.2.5 Wind Speed ................................................................................. 139 7.2.6 Surface Light ............................................................................... 140 7.3 Lyngbya Bloom Initiation .................................................................... 141 7.4 Discussion and Conclusions............................................................... 142 Discussion ................................................................................................... 145

7

List of Figures Figure 2.1: A partial hierarchy of Graphical Models to illustrate where BNs and OOBNs fit within the scope of graphical modelling. Abbreviations: CART – Classification and Regression Trees, HMM – Hidden Markov Models, ICA – Independent Component Analysis, KFM – Kalman Filter model, MR – Multiple Regression, MRF – Markov Random Field, PCA – Principle Component Analysis (Murphy 2002) ........................................................ 21 Figure 2.2: Example DAG showing the outcome of interest as the target node, with two colour coded groupings of nodes. The first group (nodes C2, C3, C4) represent a typical converging relationship, while the second group (nodes C5, C6, C7) show a diverging connection. Additionally C5, C6 and C2 form a serial connection ..................................................................... 23 Figure 2. 3: HMM represented as a type of DBN (Ghahramani, 2001) .......... 26 Figure 3.1: UML use case diagram of the conceptual processes in the Lyngbya bloom initiation Integrated Network ........................................... 44 Figure 3.2: UML use case diagram of the processes for the Lyngbya bloom initiation DOOBN ...................................................................................... 49 Figure 3.3: UML activity diagram detailing the processes for the Lyngbya bloom initiation IBN .................................................................................. 50 Figure 3.4: Science Network for Lyngbya initiation (Netica®) ........................ 51 Figure 3.5: Rainwater OOBN sub-network showing two output nodes, Groundwater Amount and Land Run-off Load, which are then connected to the input nodes Prev Groundwater and Prev Land Run-off in the next time slice .................................................................................................. 53 Figure 3.6: Science OOBN sub-network ........................................................ 53 Figure 3.7: Five time slices forming the DOOBN for Lyngbya bloom initiation ................................................................................................................. 54 Figure 3.8: Expanded sub-network instances in Hugin®, showing the interface nodes for each instance. The input and output nodes are represented here as ellipses with broken and solid lines, respectively. Also evident are the directed links between the sub-network instances of the same and the next time slice, so that information from one time slice can flow into the next time slice. ......................................................................................... 55 Figure 3.9: Extract of the Management Network for Mellum Creek Subcatchment, a visual representation of the sub-catchment, showing point and diffuse sources of nutrients. The inset shows the complete Management Network .............................................................................. 56 Figure 3.10: Probability of Lyngbya bloom initiation ....................................... 57 Figure A.1: Extract of the Management Network for Mellum Creek Subcatchment, a visual representation of the sub-catchment, showing point and diffuse sources of nutrients. The inset shows the complete Management Network .............................................................................. 71 Figure A.2: Meso-scale nutrient hazard map of the Deception Bay and southern Pumicestone Passage area including the addition of pine plantations, Melaleuca, and ASS (Pointon et al 2008) ............................. 72 8

Figure A.4: Conceptual diagram of scenario testing (Use Case Diagram) .... 73 Figure A.3: Lyngbya Science BN reviewed on 1st December 2008 ................ 73 Figure A.5: Activity Diagram for Example Scenario: Change natural vegetation to agriculture ..................................................................................... 74 Figure 5.1: Current cheetah distribution and relocation sites in South Africa (Marnewick et al., 2007) ........................................................................... 79 Figure 5.2: Cheetah estimates in Botswana by predator management zones (Klein, 2007)............................................................................................. 80 Figure 5.3: Conceptual representation of the Iterative BN Development Cycle (IBNDC) ................................................................................................... 81 Figure 5.4: UML Use Case diagram showing the interactions between the expert teams (modelling and validation) and the IBNDC processes ........ 82 Figure 5.5: UML Activity diagram demonstrating the IBNDC processes ........ 83 Figure 5.6: Conceptual network for relocation into protected fenced areas showing the node groupings at the end of the Core Process. The nodes were assigned to six groups; Area Characteristics (green), Existing population (light blue), Management Issues (blue), External Support (yellow), Direct Factors (light green), Survival (orange). The node descriptions are in Table 5.4 and several CPTs are in the Appendix. ...... 86 Figure 5.7: Site factors subnetworks for relocation on protected fenced sites (top), protected unfenced sites (left) and unprotected unfenced sites (right). .................................................................................................... 105 Figure 6.1: Map of Namibia showing the density of the cheetah population in Namibia (Marker, 2002) ......................................................................... 110 Figure 6. 2: Interface nodes for the three OOBN subnetworks. The Human OOBN subnetwork has five output nodes: Human Population Growth, Cheetah removal, Human Habitat Impact, Prey poaching and Land use. The Ecological OOBN subnetwork has four input nodes: Cheetah removal, Human Habitat Impact, Prey Poaching, Land use and two output nodes: Prey availability, Intraspecific competition. The Biological OOBN subnetwork has four input nodes: Human population growth, Cheetah removal, Prey availability and Intraspecific density and three output nodes: Recruitment, Immigration-emigration and Mortality which all feed into the target node of the combined network, Cheetah population viability ....... 114 Figure 6.3: The thirteen nodes of the Human Factors OOBN subnetwork. The output nodes have a double line and are: Cheetah removal, Land use, Human Population Growth, Human Habitat Impact, Prey poaching....... 115 Figure 6.4: The final version of the Ecological Factors subnetwork showing the input nodes Human Habitat Impact, Prey poaching, Land Use and Cheetah removal from the Human factor OOBN, and the output nodes Prey poaching and Intraspecific Density. ............................................... 116 Figure 6.5: The Biological Factors subnetwork showing the input nodes Human population growth and Cheetah removal from the Human factor OOBN and Intraspecific density and Prey availability from the Ecological factor OOBN, and the output Cheetah population viability. .................... 116 Figure 6.6: Posterior probabilities of the combined OOBN, with the target node, Free-ranging cheetah population viability, in the top left corner of the figure showing the probability of 52.4% of being viable and 47.6% of declining. ................................................................................................ 119 9

Figure 7.1: Box plot of intensity of Lyngbya blooms in Deception Bay from January 2000 to December 2005 ........................................................... 133 Figure 7.2: Lyngbya Science BN as a generic OOBN showing the input nodes with broken lines (Hugin®) ..................................................................... 134 Figure 7.3: Mean Rainfall patterns (Nov 1999 to Oct 2005) ......................... 135 Figure 7.4: Mean number of days which had low, medium and high previous dry days (Nov 1999 to Oct 2005) ........................................................... 136 Figure 7.5: Box plot of the number of days with a low minimum temperature (Jan 2000 to Oct 2005) .......................................................................... 137 Figure 7.6: Mean number of days per month for N, SE and Other wind direction (Nov 1999 to Oct 2005) ........................................................... 138 Figure 7.7: Box plot of the number of days with a high average wind speed (Nov 1999 to Oct 2005) .......................................................................... 139 Figure 7.8: Box plot of the number of days that had adequate surface light, grouped by month (Nov 1999 to Oct 2005) ............................................ 140 Figure 7.9: Lyngbya Science OOBN for December (Hugin ®) ..................... 141 Figure 7. 10: Line graph of the probability of Lyngbya bloom initiation from the temporal BNs ......................................................................................... 141

10

List of Tables Table 3.1: Conditional probability table for Bottom Current Climate node with states Low and High and parent nodes Wind Direction (states North, SE and Other), Wind Speed (states Low and High) and Tide (states Spring and Neap). These nodes, their states, probabilities and relationships are visible in the Bayesian network in figure 3.4 ............................................ 45 Table 3.2: Changes to the probability of Lyngbya bloom initiation for key factors. All possible states for each of the nodes were assessed individually to ascertain the delta effect it had on the probability of a Lyngbya bloom initiation........................................................................... 52 Table 3.3: Point and diffuse sources contributing nutrients to Deception Bay 56 Table 4.1: Nutrient hazard factors for management land uses ...................... 66 Table 4.2: Management land use showing the average hazard from the hazard map and the corresponding probability of having enough dissolved nutrients in the available pool................................................................... 67 Table 5.1: Evidence sensitivity analysis for posterior network (protected fenced BN), showing calculated entropy .................................................. 89 Table 5.2: Mutual information between the target node (Success-site) of the Protected Fenced Relocation BN and the other variables ....................... 90 Table 5.3 Description of nodes of Protected Fenced Relocation BN (Fig. 5.6) ................................................................... Error! Bookmark not defined. Table 5.4: Conditional Probability Table for Predator Threat in Protected Fenced Relocation BN ........................................................................... 105 Table 6.1: Conditional probability table (CPT) of Female mate choice node with states increase and decrease and parent nodes Cheetah removal, Intraspecific density and Immigration/emigration ................................... 112 Table 6.2: Entropy values for the nodes in the combined OOBN. The entropy value for the target node (Free-ranging cheetah population viability) is shown in italics at the top of the table as a reference for the values of the other nodes. The entropy can be considered as a measure of how ‘uninformative’ a variable is. Therefore the larger the value, the more random the distribution (Kjaerulff and Madsen, 2007) ........................... 117 Table 6.3: Mutual information between the hypothesis variable (Free-ranging cheetah population viability) and each of the variables listed in the table. They are ordered in descending order so that those variables sharing most information with the target node are at the top of the list (Kjaerulff and Madsen, 2007) ................................................................................ 118 Table 7.1: Monthly Rainfall (%) .................................................................... 135 Table 7.2: Previous number of dry days (%) ................................................ 136 Table 7.3: CPT for Minimum Temperature (%) ............................................ 137 Table 7.4: CPT for Wind Direction (%) ......................................................... 138 Table 7. 5: CPT for Wind Speed (%)............................................................ 139 Table 7.6: CPT for Surface Light (%) ........................................................... 140 11

Acknowledgements My sincere thanks go to my family, Paul, Brenda, Mark, Doug and Catherine for helping me to follow my dream, even at my ripe old age! Without their tremendous support I would not have been able to see this through. A special thanks to my supervisor, Prof Kerrie Mengersen, for the opportunity to conduct this research and for all her inspiration, support, guidance and compassion during what has seemed a very long journey and which has spanned the death of my father, Elwyn Kenneth Furness, and my younger son’s best friend, Toby East (12), who was like one of our family, as well as the diagnosis of cancer of my brother-in-law Graham mac Kechnie. A special thanks to my sister and best friend, Brenda mac Kechnie, for her unfailing love, support and encouragement. For the encouragement of many friends and family, especially my parents-in-law, Ken and Margaret Johnson, and friends Rowena Dunn, Riekje and Adam East, Alison and Terry Reilly, I owe a great debt. The support from our local church, especially Lauris Clarke and Barbara Finger, was invaluable. To the staff and fellow postgraduate students: thanks for making my time at QUT so enjoyable, especially Carla spoiling us with home baked goodies. A big thank you to de Wildt Cheetah & Wildlife Trust for hosting the first Cheetah Bayesian Network workshop and especially to the researchers Kelly Marnewick and Deon Cilliers. A special thank you to Eloise and her staff at de Wildt Lodge for the exquisite meals and great hospitality. A big thank you also to Cheetah Conservation Botswana for your enthusiastic participation and expert knowledge: Rebecca Klein, Ann Marie Houser, Kyle Good and Lorraine Boast and the rest of the team at Mokolodi: Brian and Wabotle, not forgetting the great cheetah educators and ambassadors Duma and Letotsi. A huge thank you to Alta de Waal at Meraka Institute, CSIR, South Africa for your enthusiasm, friendship, statistical expertise and hospitality. Heartfelt thanks to Laurie Marker and everyone at Cheetah Conservation Fund in Namibia for hosting the second Cheetah BN workshop, and for your tireless and constructive participation in the workshop as well as the great hospitality shown to us during our stay. I would also like to thank Chris Brown and Robin Lines from Namibia Nature Foundation for their collaboration on wild dogs and their hospitality. This thesis would not have been possible without the generous sponsorship of the School of Mathematical Sciences, the Australian Research Council’s International Linkage Grant, financial assistance from the Environmental Protection Agency and Australian Government through the South East Queensland Healthy Waterways Partnership, the ARC Centre for Dynamic Systems and Control, and QUT Institute for Sustainable Resources. This thesis is dedicated to God and to the loving memory of those who are resting peacefully in His presence: specifically Ria Jacobs, Ken Furness, Ben Werner, Toby and Dan East. Without God’s love, grace and guidance I could not have done this. 12

Chapter 1 Introduction Dealing with ecological problems is inherently complex. Management decisions have to be made when little or no evidence is available, yet inaction is not a viable option as that may have disastrous consequences. There are typically many factors involved and not all may be known at the time. The environmental manager has to assess available knowledge from a scientific and a management perspective to make the best possible informed decisions and implement the most appropriate management options to mitigate the environmental issue of concern. Moreover there is an abundance of specialised statistical modelling and simulation software available for analysis and integration of available data and other information, each designed to address a particular aspect of the problem. Choosing appropriate tools and making sense of the disparate outputs are everyday challenges facing the ecologist and environmental manager. Bayesian Networks provide a statistical modelling framework that enables analysis and integration of information in its own right as well as integration of a variety of models addressing different aspects of a common overall problem. This thesis then focuses on this framework and addresses the following overall aims.

Research Aims and Questions The aims of this thesis are threefold: 1. To develop a Bayesian network framework approach which facilitates the derivation of integrated statistical models within a multi-field and multi-expert context 2. To integrate disparate statistical models of an ecological problem within a Bayesian network framework 3. To apply these approaches to substantive ecological problems

13

There is thus a dual focus in this thesis: a methodological focus (aims 1. and 2.) which extends the BN framework for complex environmental modelling and an application focus (aim 3.), which involves two environmental case studies. The first methodological aim is pro-active where my objective is to create integrated models and the second is reactive, where I am confronted with several existing models which need to be integrated to create a more comprehensive picture of the situation, or to enable better informed decisions to be made in the face of uncertainty. Although BNs are growing in popularity in environmental disciplines (Uusitalo, 2007), the development of more sophisticated BNs, utilising dynamic and OO features is still at the frontier of ecological research, where the available data are sparse and the underlying biological and physical models very complex. In addition, environmental problems lend themselves to a series of integrated sub-networks describing different scientific components, combining scientific and management perspectives, or pooling similar contributions developed in different locations by different research groups. In this thesis I do not develop BN methods per se, but instead I develop approaches to BN modelling using a BN framework and an OO paradigm to enable system integration, and the extension and redesign of existing models. Furthermore, this thesis does not explore and compare other statistical approaches for the ecological problems considered here. Instead, these aspects present opportunities for further research. The methodological research questions of this thesis are defined below. The complex nature of environmental issues makes them ideal case studies for the proposed integrated approach to modelling. The first case study is the conservation of cheetahs, Acinonyx jubatus, in southern Africa and the second the initiation of blooms of Lyngbya majuscula, a blue-green algae, in Deception Bay, Australia. The first case study is motivated by the fact that the IUCN status for cheetah is Vulnerable, VU C2a(i). This status means that it is considered to have a high risk of extinction in the wild (IUCN, 2007). Namibia and Botswana have the largest free roaming populations, so maintaining a viable wild cheetah population is the foremost priority for cheetah conservation in these countries (Purchase et al., 2007). The majority of South Africa is privately owned with only about 5% as state-owned, protected areas (Cummings, 1991). The free roaming cheetah population here occurs at the northern border with Botswana with the remaining cheetah population confined mainly to reserves, scattered widely throughout the country. This has necessitated the creation of a metapopulation management plan to manage geographically separated populations (Marnewick et al., 2007). The main threats to cheetahs are interspecific competition, increased contact and conflict with humans and fragmented habitat (GCCAP, 2002). This can lead to cheetahs being perceived as problem animals by farmers and local communities. Relocation is a strategy often favoured by conservationists to deal with problem cheetahs. We thus identified two issues of concern in this first case study which I will address in this thesis: (i) the success of cheetah relocations within 14

the broader context of long-term viability of wild cheetah metapopulations and (ii) the viability of the free roaming cheetah population in Namibia. This case study was undertaken in collaboration with the Cheetah Conservation groups in Namibia, Botswana and South Africa. Concern The second case study involves the cyanobacterium (blue-green algae), Lyngbya majuscula. Lyngbya grows on the sediment or over the seagrass, algae or coral and when the conditions are favourable, the algae goes through a rapid growth phase, resulting in a substantial increase in biomass, commonly referred to as a bloom (Ahern et al., 2007). Blooms occur naturally in tropical and subtropical coastal areas worldwide and also in Deception Bay, Australia, which is the study area for the second case study (Dennison et al., 1999). With its proximity to Brisbane, Australia’s third largest city, Deception Bay is a popular tourist destination. It also has a history of Lyngbya blooms, which appear to be increasing in both frequency and extent (Dennison and Abal, 1999) . The many waterways feeding from intensive and rural agricultural activities into the Bay and its use for commercial and recreational fishing, put pressure on the marine environment and compound the substantial environmental and health issues resulting from such a toxic algae. It is therefore imperative to develop a better understanding of the scientific and management factors impacting on Lyngbya bloom initiation. The ecological issue of concern that is addressed in this case study is therefore the initiation of Lyngbya majuscula blooms in Deception Bay, Australia. This case study was undertaken as part of a project with South East Queensland (SEQ) Healthy Waterways Partnership, a collaboration between government, industry, research and the community, which coordinated the Lyngbya Research and Management Program (2005-2007). These two case studies complement each other well, as they entail different dynamics and challenges, but both have international implications and both present complex ecological problems. Conservation of endangered predators and the strategies employed to protect them are common across many countries and species. Similarly Lyngbya and other nuisance algal blooms occur worldwide and have similar environmental, health and financial issues. Furthermore, the initiation of a Lyngbya bloom and the conservation of cheetahs are complex issues, affected by many different environmental and management factors and interactions. Moreover, the expert groups in these two case studies are very different. The Lyngbya experts are a large team of experts from varied scientific disciplines as well as a management team from local and State government, whereas the cheetah experts are smaller teams bringing different country experiences to the table. There was little existing data for cheetah relocations in either South Africa or Botswana, but there was a great deal of expert knowledge available. Although the factors leading to Lyngbya bloom initiation were not clearly understood, there was a substantial amount of data available for some of the factors and a high level of expert knowledge on other factors and interactions. This discussion, in the context of the overall aims of this thesis, motivates several methodological and research questions.

15

The methodological research questions are: M1. How can we use a Bayesian network as an integration tool for ecological problems which have several existing statistical models? M2. How can we utilise object oriented modelling concepts in Bayesian network design? M3. How do we design for network integration, without compromising a multi-disciplinary approach? M4. How do we maintain the independence and specialist focus of subnetworks when we specifically design for integrated and comprehensive Bayesian networks? The applied research questions which are formulated with respect to the three ecological issues of concern are: A1. Can we meaningfully integrate disparate models of Lyngbya majuscula to provide a more comprehensive model of bloom initiation? A2. How can the hazard ratings of nutrients translate to potential changes in the probability of a Lyngbya bloom initiating? A3. How do we collate the available knowledge on cheetah relocations to improve the success of a relocation event within the wider context of the viability of wild cheetah metapopulations in South Africa? A4. How do we create an integrated BN model of the viability of the free roaming cheetah population in central northern Namibia without compromising on the breadth and depth of the investigation? A5. Can we apply the BN design techniques from the cheetah models to Lyngbya models and increase the return on investment in expert elicitation? These methodological and applied questions are directly addressed by chapters 3 to 7 and are discussed in more detail in the Thesis Overview.

Thesis Overview The thesis has been written as a series of papers which have been submitted to journals and have been left in their entirety. Chapters 3 to 7 each address one or more research questions. The Publication Details section below lists these papers, their current status, the co-authors and the journals where the papers were submitted.

16

Chapter 2 comprises a Literature Review of traditional Bayesian networks, dynamic Bayesian networks, the object-oriented paradigm on which the IBNDC heuristic developed in this thesis is based, object oriented Bayesian networks and dynamic object oriented Bayesian networks. I also summarise a paper by Uusitalo (2007) reviewing the advantages and disadvantages of Bayesian networks in environmental modelling and management. This literature review provides the background and foundation for the methodological component of the thesis and the application of these models to the two case studies. The literature review is additional to the information already provided in the individual papers included as chapters in this thesis. Chapter 3 is motivated by the first methodological (M1) and applied (A1) research questions. I demonstrate the way in which an integrated Bayesian network (IBN) can be created to combine three disparate models (M1). This was achieved by using a management centric network to quantify a management action in the catchment area and then simulate this action through a catchment model to obtain the modified nutrient loads. The Bayesian network which describes the scientific factors affecting the initiation of a Lyngbya bloom (Science BN) is then updated with the modified loads and recompiled to forecast the effect on the probability of a Lyngbya bloom initiation. Furthermore by utilising an OO framework the Science BN, which was created as part of the SEQ Healthy Waterways Lyngbya Project, was transformed from a static BN to a dynamic BN to capture the temporal nature of Lyngbya blooms and model the bloom initiation at a finer scale. This work was done in collaboration with Ms Fiona Fielding, Dr Grant Hamilton and Professor Kerrie Mengersen, who facilitated the workshops and modelled the science and management BNs and who are the co-authors for this paper. My contribution has been the development of the IBN approach to extend and transform the static Science BN, created by my collaborators, into a dynamic OO BN. This work demonstrates that we can in fact meaningfully integrate scientific models of Lyngbya majuscula with management models (A1). By using the same framework to further develop the model of Lyngbya bloom initiation, I also addressed research question M4. Chapter 4 also deals with the first methodological research question (M1) by devising a flexible process to integrate three models which were created during the Lyngbya Research and Management Program (2005-2007). Two models, the science and management networks, are mentioned in Chapter 3 and the third model is a GIS based model. This model is a Nutrient Hazard Map and was produced by co-authors Mr Shane Pointon, Mr Chris Vowles and Mr Col Ahern to give an indication of an area’s potential to export nutrients of concern to coastal waters. I demonstrate the integration process (M1) by an example scenario where the outcome of this scenario is expressed as a probability of Lyngbya bloom initiation, which deals with the second applied research question (A2). Furthermore, the third methodological question (M3) is addressed by retaining the specialist models as fully functional entities whilst being able to interact with each other through the newly defined interface. All the co-authors, which included Ms Kathleen Ahern and Professor Kerrie Mengersen, actively participated in the integration workshops, which I facilitated, to discuss and refine the proposed integration 17

process. The integrated model provides a better understanding of the impact of changes in management practices in the catchment area on Lyngbya bloom initiation. I am first author on the paper which reports on this research. I solicited opinions from my co-authors, consolidated their feedback and that of the reviewers. The paper was refereed and accepted for the Queensland Coastal Conference 2009, Waves of Change, held at the Gold Coast on 12-15 May 2009. An extended version of this paper is in preparation for submission to Human and Ecological Risk Assessment and supplementary diagrams are provided in the Appendix to this chapter. Chapter 5 is predominantly concerned with the second methodological research question (M2) and to some extent the fourth (M4). This motivated the introduction of a new heuristic, Iterative Bayesian Network Development Cycle (IBNDC), which is used to formulate several networks distinguishing between the unique relocation experiences and conditions in Botswana and South Africa. This application of the IBNDC satisfies the third applied research question (A3). The IBNDC is conducive to OO BNs facilitating the design of dynamic networks and the refinement, reuse and redesign of existing BNs. This research was conducted during a four day workshop at de Wildt Lodge, South Africa in November 2007. The workshop was jointly funded by Queensland University of Technology (QUT) in Australia, de Wildt Cheetah and Wildlife Trust and Council for Scientific and Industrial Research (CSIR) in South Africa. I initiated this research, contacted cheetah experts in Botswana and South Africa and a statistician versed in BN modelling, coordinated and arranged a subsidy for the workshop. The workshop was co-hosted by senior statistician, Alta de Waal, from Meraka Institute, CSIR, South Africa, Professor Kerrie Mengersen and I. The cheetah experts involved in this study were Deon Cilliers and Kelly Marnewick, researchers with de Wildt Cheetah Wildlife Trust, and Ann Marie Houser and Lorraine Boast, researchers with Cheetah Conservation Botswana. This research has been reported in a journal paper, for which I am first author and is co-authored by all the workshop participants. The paper has been accepted subject to revision by Ecological Modelling. Chapter 6 is motivated by the fourth applied research question (A4). A workshop was held at Cheetah Conservation Fund, Namibia in June 2008 to model the viability of wild cheetah populations in Namibia. I initiated and planned the workshop with the support of the founder of Cheetah Conservation Fund (CCF), Dr Laurie Marker. The workshop was attended by cheetah researchers from CCF, Namibia: Dr Laurie Marker, Chris Gordon, Anne Schmidt-Küntzel, Matti Nghikembua, Fabiano Ezequiel, Burton Gaiseb; cheetah experts from Tanzania: Dr Bettina Wachter and Jörg Melzheimer; and a representative from the Ministry of the Environment and Tourism in Namibia, Josephine Henghali. The workshop was partly funded by CCF and QUT. Professor Kerrie Mengersen and I jointly facilitated the workshop and guided participants in using the IBNDC methodology to develop three subnetworks in parallel, which addressed the third and fourth methodological questions (M3, M4) by having three distinct self-contained, yet integrated 18

subnetworks, each focussing on a different discipline such as anthropology, habitat ecology and cheetah biology. The three expert teams continued the development, testing and refinement of their subnetworks after the workshop and then submitted them to me for integration into the overall model. The journal paper is co-authored by workshop participants and I consolidated the contributions and solicited feedback from the co-authors prior to submission to Journal of Animal Ecology. Chapter 7 addresses the fifth applied research question (A5). The study uses the IBNDC heuristic (although this is not directly referenced in the paper) within an OO framework to investigate the temporal behaviour of Lyngbya bloom initiation. Two aspects of OO modelling used in this study are reuse and subclassing which are applied to both the network and expert opinion, and therefore deals with the second methodological question (M2). I was able to transform the static Lyngbya Science BN very simply into an OOBN so that it could be reused for the monthly time slices. Furthermore by using the specific data for each month of interest I ‘subclassed’ the OOBN to take on the specific values for that month while retaining the elicited values for all the non-temporal nodes. This addressed methodological questions M2 and M4. Similarly, expert opinion elicited for the static Science BN was reused and modified for the dynamic BN (M2). My role in this study was facilitating the review of the static Science BN and converting it into monthly time slices as identified by the science review team. I am first author for this paper which has been co-authored with Professor Kerrie Mengersen. This paper has been peer reviewed and accepted for presentation at the 18th World IMACS Congress and MODSIM09 International Congress on Modelling and Simulation in Cairns, Australia in July 2009. Chapter 8 provides a final discussion which summarises and discusses the significance and limitations of this research. Possible future applications of the methodology developed in this thesis are proposed and areas which would be suitable for further research are identified.

19

References Ahern K S, Ahern C R, Savige G M and Udy J W 2007 Mapping the distribution, biomass and tissue nutrient levels of a marine benthic cyanobacteria bloom (Lyngbya majuscula) Marine and Freshwater Research 58 883-904 Dennison W C and Abal E G 1999 Moreton Bay Study: A Scientific Basis for the Healthy Waterways Campaign (Brisbane: South East Queensland Regional Water Quality Management) Dennison W C, O’Neil J M, Duffy E J, Oliver P E and Shaw G R 1999 Blooms of the cyanobacterium Lyngbya majuscula in coastal waters of Queensland, Australia Bulletin de l’Institut Oceanographique, Monaco 19 501-506 Cummings D H M 1991 Developments in game ranching and wildlife utilisation in east and southern Africa, Wildlife Production: Conservation and Sustainable Development, ed L A Renecker and R J Hudson: University of Alaska, Fairbanks) pp 96-108 GCCAP 2002 Global Cheetah Action Plan Review final workshop report. In: Global Cheetah Conservation Action Plan - Workshop, ed P Bartels, et al. (Shumba Valley Lodge, South Africa: Apple Valley, MN: IUCN / SSC Conservation Breeding Specialist Group) p 78 IUCN 2007 The IUCN Red List of Threatened Species. Marnewick K, Beckhelling A, Cilliers D, Lane E, Mills G, Herring K, Caldwell P, Hall R, Meintjes S 2007 The Status of the Cheetah in South Africa Summary of Country Reports CAT News, Special Edition 3 22-31 Purchase G K, Marker L, Marnewick K, Klein R and Williams S 2007 Regional assessment of the status, distribution and conservation needs of cheetahs (Acinonyx jubatus) in southern Africa: Summary of Country Reports CAT News, Special Edition 3 44-46 Uusitalo L 2007 Advantages and challenges of Bayesian networks in environmental modelling Ecological Modelling 203 312-8

20

Chapter 2 Literature Review

The literature review is chiefly sited within the papers included as chapters 3 to 7. It has deliberately been left in this form so that the papers remain intact as a self contained contribution to the thesis. However, additional background information is provided in this chapter for three reasons: (1) to set the case studies presented in this thesis into a broader ecological context, (2) to introduce the key object oriented (OO) concepts which motivated the integrated BN design approach developed in this research, and (3) to present more detailed information on the methodologies used in the two case studies. First, in Figure 2.1 we present a partial model hierarchy to illustrate where Bayesian Networks (BN) and related models such as Object Oriented BNs (OOBN), Dynamic BNs (DBN) and Dynamic OOBNs (DOOBN) fit into the broader scope of graphical models. This diagram is based on an illustration in Murphy (2002), which also gives a detailed background to BNs and DBNs.

Figure 2.1: A partial hierarchy of Graphical Models to illustrate where BNs and OOBNs fit within the scope of graphical modelling. Abbreviations: CART – Classification and Regression Trees, HMM – Hidden Markov Models, ICA – Independent Component Analysis, KFM – Kalman Filter model, MR – Multiple Regression, MRF – Markov Random Field, PCA – Principle Component Analysis (Murphy 2002)

21

The literature review is organised in the following way: Bayesian networks (BNs) and their key characteristics are introduced in the first section, 2.1. A specialisation of a BN, suitable for temporal models and known as a dynamic BN, is then described in section 2.2. Thereafter, the object oriented (OO) paradigm is discussed in section 2.3, highlighting those OO concepts which are key to the integrated modelling approach and the heuristic developed in this thesis. The following two sections, 2.4 and 2.5, briefly review two types of BN models which incorporate OO concepts: an object oriented Bayesian network (OOBN) and a dynamic object oriented Bayesian network (DOOBN) which were used in the two case studies presented in this thesis. Section 2.6 reviews integrated BNs and section 2.7 the validation and evaluation of BNs, an important part of the BN modelling process. The last section on BN modelling, section 2.8, discusses alternative approaches to BN modelling and the advantages and disadvantages of BN modelling with reference to some comprehensive review papers. The next three sections introduce the broader ecological contexts in which the three environmental issues, addressed in this thesis, are embedded. The viability of the free-ranging cheetah is embedded in the larger context of population viability analysis reviewed in section 2.9. The relocation of problem cheetahs is one of the strategies used in cheetah conservation (section 2.10) and the initiation of Lyngbya majuscula blooms in Deception Bay, Australia, is embedded in the larger context of harmful algal blooms (section 2.11), which occur in coastal waters worldwide.

2.1. Bayesian Networks A Bayesian network is a mathematical model (Pearl 1988; Neapolitan 1990; Jensen and Nielsen 2007) which provides a graphical representation of key factors and interactions for an outcome of interest (Borsuk et al., 2006; McCann et al., 2006; Jensen and Nielsen, 2007; Uusitalo, 2007) such as the success of a cheetah relocation or the initiation of a Lyngbya bloom. These factors are represented as nodes in the diagram and their dependencies on other factors and the outcome of interest are depicted as directed links to form a directed acyclic graph (DAG) (Lauritzen, 2003). The variables (factors) in a BN may be at different temporal and spatial scales and the data represented in the network may originate from diverse sources such as empirical data, expert opinion and simulation outputs (Borsuk et al., 2006; McCann et al., 2006; Jensen and Nielsen, 2007; Park and Stenstrom, 2008). Underlying each of the nodes is a probability table that is determined by the states a node can be in and by the states of the parent nodes. The BN provides probabilities for each node, including the outcome of interest, given the factors influencing them, their interactions and their individual conditional probabilities (Pearl, 1988; Jensen and Nielsen 2007).

22

Figure 2.2: Example DAG showing the outcome of interest as the target node, with two colour coded groupings of nodes. The first group (nodes C2, C3, C4) represent a typical converging relationship, while the second group (nodes C5, C6, C7) show a diverging connection. Additionally C5, C6 and C2 form a serial connection An example of a DAG is shown in Figure 2.2. The BN has a probabilistic framework that describes the strength of the relationships between the variables (Pollino et al., 2007b; van der Gaag et al., 2007). BNs are a useful statistical tool for collating, organising and formalising information such as empirical data, model outputs, secondary sources and expert knowledge about the issue of concern (Uusitalo, 2007). There has been increased interest to use BNs in natural resource management (Borsuk et al., 2004; Bromley et al., 2005; Pollino et al., 2007) and environmental issues (Castelletti and Soncini-Sessa 2007; Marcot et al., 2006; Smith et al., 2007). A directed path between two nodes, X and Y, consists of the sequence of the nodes linking X and Y, such that each node in the sequence is the parent of the following node (Ghahramani, 2001). From the example DAG in figure 2.2 one directed path is {C5, C6, C2, Outcome of interest}, and another directed path is {C4, C2, Outcome of interest}. By definition, in a DAG no path starts and ends at the same node and no feedback loops are allowed (Saddo et al., 2005; Jensen and Nielsen, 2007; Uusitalo, 2007; Park and Stenstrom, 2008). The absence of any feedback loops simplifies belief propagation through the BN when evidence is entered into the network. The nature of the evidence we receive may be ‘hard’ or ‘soft’. When we receive information that the variable is in a certain state, the evidence is ‘hard’. Any other evidence is considered ‘soft’, for example ‘the variable is more likely to be in certain states than others’. Soft evidence can be viewed as prior probability updates of the states of the variable (Taroni et al., 2006; Weber and Jouffe, 2006). The two key characteristics of BNs are directional separation (d-separation) and the assumption of the Markov property. These properties greatly simplify probability calculations in BNs.

D-separation D-separation is an important concept in BNs which simplifies complex probability calculations. The seminal book on the rules of d-separation is written by Pearl (1988). Subsequently Lauritzen et al. (1990) documented alternative criterion for determining d-Separation. D-Separation concerns the blocking of information flow between nodes.

23

To illustrate the concept of information blocking, we consider three possible types of connections between nodes in a BN.  Serial connection  Diverging connection  Converging connection Serial Connection In figure 2.2 nodes C2, C6 and C5 are serially connected. If the state of C6 is known, then it prevents the flow of information from C5 to C2. In other words the flow of information is blocked between them and we say that C5 is dseparated from C2 given C6. This type of reasoning typically applies to ‘chain’ reasoning such as we have in a Markov Chain (Taroni et al., 2006; Jensen and Nielsen, 2007). Diverging Connection The light brown nodes: C6, C5 and C7 in figure 2.2, form a diverging connection. Information flow is blocked here if the state of C5 is known. In this case more information about the state of C6 will not change the belief about the state of C7 and likewise, knowledge about the state of C7 provides no additional information about the state of C6. In this case C6 is d-separated from C7 given C5 (Taroni et al., 2006; Jensen and Nielsen, 2007). Converging Connections The green nodes in figure 2.2: C3, C4 and C2, form a converging connection. This type of connection requires slightly more complex reasoning and the fact that BNs can deal with this type of reasoning is a great benefit (Taroni et al., 2006; Jensen and Nielsen, 2007). If the state of C2 is known, then information about the state of C3 provides information about the states of C4 (and vice versa), so in this case C3 and C4 are d-connected. However if there is no information about the state of C2, then knowing the state of C3 does not provide any information about the state of C4. In this situation C3 and C4 are d-separated.

Markov Property The other important characteristic of BNs is the assumption of the Markov Property, which means that if we know the present, then the past has no influence on the future. In other words, there are no other direct dependencies in the network over and above those already represented by directed links in the network (Jensen and Nielsen, 2007). This gives rise to the concept of a Markov blanket for a node and effectively constitutes the neighbours of the node. The Markov blanket is made up of the parents of the node, the children of the node and any nodes which share a child with that node. Moreover if a Markov blanket is instantiated then it is dseparated from the rest of the network (Murphy, 2002; Jensen and Nielsen, 2007). ,…, be a set of nodes in a Bayesian network. From the Let multiplication law in probability theory we can represent the joint probability distribution of these variables as shown in (2.1): 24

 n  P (U )   P( X i / X 1,..., X i 1 )  P ( X 1 )  i 1 

(2.1)

However, because BNs have the Markov property we can rewrite this joint probability much more simply as shown in (2.2) which is known as the chain rule for Bayesian networks (Taroni et al., 2006; Jensen and Nielsen, 2007). n

P (U )   P( X i / pa ( X i )) i 1

where

(2.2)

are the parents of

In other words, the probability distribution of a Bayesian network is the product of the conditional probabilities of all the variables in the BN, conditioned only on their parents (Ghahramani, 2001; Taroni et al., 2006; Jensen and Nielsen, 2007). Bayesian networks are used to calculate probabilities after entering evidence into the network. We now consider two nodes and which are d-separated ⁄ , ⁄ after entering evidence . Therefore and the chain rule for BNs then becomes the equation shown in (2.3) n

m

i 1

j 1

P (U , e)   P( X i / pa ( X i )) e j

and for

X U

(2.3)

we have

P ( X / e) 



U \{ X }

P(U , e)

P (e )

(2.4)

(Jensen and Nielsen, 2007) Furthermore, approximation algorithms are often used for inference with BNs, because exact inference is NP-hard (NP - nondeterministic polynomial time) (Ross and Zuviria, 2007).

2.2. Dynamic Bayesian Networks (DBN) A Dynamic Bayesian Network (DBN) is a traditional BN with a temporal dimension (Weber and Jouffe, 2006) where interdependent entities change over time (Ross and Zuviria, 2007). DBNs are used to model time series data (Ghahramani, 2001) and are ideally suited to object oriented modelling techniques (Jensen and Nielsen, 2007). Ross and Zuviria (2007) contributed to the methodological development of DBNs by evolving DBNs using a multi-objective genetic algorithm. They created a network structure modelling the causal relationships that explain an example sequence of multivariate data using a genetic algorithm. They constructed the DBN from two DAGs, a traditional BN, representing the static state of the network and a transition network representing the dynamic relationships with stationary transition probabilities. They used a network’s 25

probability score and structural complexity score as the criteria for a multiobjective evaluation strategy with a generic algorithm. The Pareto ranking scheme is a popular choice for such an application as it aims to preserve the independence of objectives by retaining a set of possible, legitimate solutions, with respect to the population at large. Pareto ranking also balances the effect of the likelihood and structural simplicity terms used in the basic Bayesian information criterion (BIC) network evaluation heuristic. The basic structural scoring formula used by Ross and Zuviria (2007) attempts to keep the number of links in the network roughly equal to the number of variables. Both Ghahramani (2001) and Murphy (2002) explore the role of hidden Markov modelling in the context of DBNs. Hidden Markov models (HMM) are often used to model time series data and Ghahramani (2001) demonstrates that HMMs are a special type of DBN. This fact has changed the understanding of HMMs and as a result enables hidden Markov modelling of more complex models (Ghahramani 2001). Hidden Markov modelling is based on two assumptions. The first assumption is that an observation at time t was generated by a process whose state, St, is hidden from the observer. The second assumption is that given the state St-1, the current state St is independent of all the states prior to t-1. This assumption satisfies the Markov property and in this example the process would satisfy a first-order Markov property. In other words only the information contained in state St-1 is needed to predict state St and all the prior states are disregarded. An n-th order Markov property is one where given states St-1,..., St-n, St is independent of Sτ for τ < t – n. For this time series the prediction of state St is informed by states St-1 to St-n but not on states prior to St-n. In a similar way the states of the outputs of the process would also satisfy a Markov property, i.e. given state St, Yt is independent of the states and observations of the other time slices. Hence the joint distribution of a sequence of states and observations can be factored as follows: T

P ( S 1 : t , Y 1 : t )  P ( S 1) P (Y 1 / S 1) P ( St / St  1) P (Yt / St )

(2.5)

t 2

Figure 2.3 shows the factorisation of (2.5) graphically represented as a DBN with each variable corresponding to a node in the diagram and the directed arcs indicate the nodes on which it is conditionally dependent (Ghahramani 2001).

Figure 2. 3: HMM represented as a type of DBN (Ghahramani, 2001)

26

Another popular model which can be represented as a type of DBN is the Kalman filter, or linear Gaussian state-space model, which is the continuousstate version of HMMs (Ghahramani 2001).

2.3. Object Oriented Paradigm Object-oriented (OO) analysis and design has been embraced by the programming community since the early 80’s (Kifer et al., 1995). However the application of the OO paradigm to BN design and construction was only introduced to the Artificial Intelligence (AI) community more than a decade later by Koller and Pfeffer (1997) when they defined object oriented Bayesian networks (OOBN). The benefits of OOBN modelling in the environmental community have not yet been realised, although traditional BN modelling has been growing in popularity in this field (Bromley et al., 2005; McCann et al., 2006; Smith et al., 2007; Uusitalo, 2007). OOBNs are discussed in more detail in section 2.5 below. The OO paradigm for programming is viewed by some as a natural progression (evolution), but by others as a revolution (Riel, 1996). Nonetheless, this new approach to software development required a simultaneous shift away from the traditional waterfall model for software development consisting of five distinct steps: analysis, design, coding, testing and maintenance; to an iterative model allowing feedback at each of the five stages (Riel, 1996). The salient elements of the OO paradigm are encapsulation, abstraction, modularity, interface and inheritance (Grady, 1994; Riel, 1996).

Encapsulation Encapsulation is the term used to describe the hiding of information. Access to certain information is only allowed via a pre-defined interface, therefore data, behaviour and detail of its implementation are effectively encapsulated within an object (Monarchi and Puhr, 1992; Chidamber and Kemerer, 1994; Basili et al., 1996; Pastor et al., 2001). This enables us to modify the internal details of an object without affecting others (Nelson et al., 1996).

Abstraction Abstraction is perceived as the essential or fundamental parts of an object (Grady, 1994) and what is considered as essential can vary depending on the user’s perspective, thus having different levels of abstraction depending on the focus and purpose of the model (Monarchi and Puhr, 1992; Grady, 1994; Skillicorn and Talia, 1998; Steimann, 2000). Grady (1994) draws a parallel between abstraction and people’s search for similarity to cope with complexity in objects or situations. An example of the use of abstraction is in the software package Winbugs (windows-based version of BUGS - Bayesian inference Using Gibbs Sampling) where DAGs are used to communicate the essential structure of the model and therefore hiding details of the distributional assumptions and deterministic relationships (Lunn et al., 2000). 27

Modularity Modularity is often used interchangeably with reuse, and refers to the modules of a system being constructed in such a way that they form a set of loosely connected units (Grady, 1994; Pastor et al., 2001). Depending on the level of modularity, it can greatly simplify program modifications by breaking down a complex problem into smaller problems, and facilitate contributions of several researchers (Nelson et al., 1996; Lunn et al., 2000).

Interface An interface determines the integration between objects. It is critical that this interface remains fixed so that changes may be made to an object without affecting another object to which it is connected, except through the predetermined interface (Lunn et al., 2000). The interface specifies the way in which other objects may interact with it and therefore additional objects may be linked to it in the future, although the interface may have been specifically designed for a different integration (Riel, 1996). In other words, we may view an object as an encapsulation of its structure, data and behaviour which can only be accessed through its well defined interface (Monarchi and Puhr, 1992). Winbugs is an example of a software package designed in this way so that new versions of the software do not cause assumptions based on earlier versions to be invalid (Lunn et al., 2000).

Inheritance Inheritance is a key concept in the OO paradigm and in AI (Kifer et al., 1995) and facilitates reuse (Harel and Gery, 1997). If an objects inherits properties from another object, then it can perform the same functions as the object it inherited from, but it may (and probably does) perform these functions in a different way. The inheritance relationship between two objects is often referred to as a ‘is-a’ subclassing relationship (Harel and Gery, 1997). Inheritance can be viewed as a hierarchy of objects having simple or multiple inheritance (Grady, 1994).

2.4. Object Oriented Bayesian Networks (OOBN) Although BNs have successfully and widely been used in many disciplines, they are inadequate when modelling large complex domains (Koller and Pfeffer, 1997). Object Oriented Bayesian networks (OOBN) are BNs with instance nodes. An instance node represents an instance of another network, which could itself contain instance nodes. Interface nodes (input nodes and/or output nodes) enable connectivity with other OOBNs (Hugin, 2007; Jensen and Nielsen, 2007). This enables the construction of complex and dynamic models (Koller and Pfeffer, 1997). The seminal paper in Object Oriented BN theory by Koller and Pfeffer (1997) describes an object oriented BN language that facilitates the modelling of complex domains in terms of inter-related objects. In a traditional BN the network structure and nodes are specific to the particular domain for which it was created. Reusing nodes and network fragments in other BNs is a manual and tedious process of copying and pasting nodes and conditional probability 28

tables (CPT) to other BNs. In addition, if there are any changes to the base network fragment, they have to be manually applied to all the other network fragments. In the programming environment these problems have largely been overcome by the use of object oriented programming languages which provide the framework for code reuse through abstract data types. In a similar vein OOBNs provide a framework for modelling large complex data structures by simplifying the knowledge representation and facilitating reuse of nodes and network fragments. Koller and Pfeffer (1997) define key terms of the OOBN language such as basic and complex types, simple and complex objects, a value type, a stochastic function, an object oriented network fragment, an object oriented Bayesian network, interface subtypes and subclasses. The basic building block in an OOBN is an object, which can be a physical entity, an abstract entity, or a relationship between two entities. A random variable is the most basic object in an OOBN. Complex objects have a set of attributes, e.g. a person is a complex object with attributes such as hair colour, eye colour, height, weight etc., whereas a person’s hair colour is a simple object taking a value within a finite range. Each object is viewed as a stochastic function from the inputs to the outputs, i.e. for each value of its inputs a probability distribution over the value of its outputs is returned. A CPT in a traditional BN would be a simple stochastic function defining a random variable in a BN. To define a complex object, a stochastic function is assigned to each of its attributes and the attributes are connected in a Bayesian network. The input and output attributes of an object is referred to as its interface. The internal parts of an object are encapsulated within the object, which means that from a probabilistic point of view, the encapsulated attributes are d-separated from the rest of the network (Jensen and Nielsen, 2007). The definition of classes of objects in OOBNs enable a more generic, reusable model to be described, which can then be used in different contexts. A class is a generic network fragment and when this class is instantiated it is called an object. A class may be instantiated many times (Jensen and Nielsen, 2007). Several classes may share common substructures. These subclasses can inherit many attributes and behaviours from the parent class, which they can then modify and enhance. The parent class can be viewed as being more abstract than its subclasses with only the important details being retained, whereas subclasses define more specific attributes and behaviours. The ability to create subclasses that inherit properties from another class is a well known and very useful characteristic of object oriented modelling (Koller and Pfeffer, 1997).

2.5. Dynamic Object Oriented Bayesian Networks (DOOBN) The use of DOOBNs in ecological applications is rare, despite the fact that many ecological problems are temporal in nature. Two relatively recent applications of DOOBN methodologies were reported by Molina et al. (2005) and Weber and Jouffe (2006). However the latter study was not in an 29

ecological context, instead Weber and Jouffe (2006) used DOOBNs to represent complex manufacturing processes. The study by Molina et al. (2005) applies spatio-temporal BNs to assist during flood emergencies. During a flood decisions need to be taken when the sources of information are imperfect and incomplete. This BN was then included in a computer system to provide assistance to operators in real-time. Their research found that the visual representation of the model in the BN was useful in creating confidence for decision makers in the results produced by the system. Bayesian networks as presented in this paper provide a natural and intuitive description of hydrological processes based on a symbolic representation with qualitative variables and causal relations. This is very useful to formulate decision models with high levels of abstraction and explicit meaning. The ability to represent uncertainty in a BN model was seen to be a great benefit over deterministic models. Weber and Jouffe (2006) suggest a methodology for DOOBNs based on their case study to optimise the diagnosis and maintenance of complex manufacturing processes. These processes have to be dynamically modelled and controlled. They found DOOBNs a very powerful tool to facilitate optimal decision-making in these situations, and that the dependency between several failure modes of a component was easily modelled by a BN, as was common modes (Weber and Jouffe, 2006).

2.6. Integrated Bayesian Networks Bayesian networks are suited for modelling and interrogation of complex environmental issues (Bromley et al., 2005) such as algal blooms, but to our knowledge an IBN approach has not previously been applied to cheetah relocations or to Lyngbya bloom initiation. Common practice in current research is to create larger more complex BNs to incorporate the different aspects of the issue of concern, as demonstrated in Bashari et al. (2009). They report on the integration of a state transition model (STM) and a BN to create a rangeland dynamics model. This integration is achieved by transforming the STM model into a BN and then combining the two BNs into one large BN. The proposed integration techniques in this thesis integrate BNs within an OO framework which simplifies the integration, retains the original networks, facilitates parallel development of subnetworks and the reuse of network fragments and expert knowledge. There is very little literature on combining disparate BNs in Ecology. This research aims to address this knowledge gap.

2.7. Bayesian Network Validation and Evaluation Sensitivity analysis of a BN is an essential part of the evaluation process. Sensitivity of the target node(s) to variations in the evidence entered into the network (evidence sensitivity) and to variations in the values of the parameters (parameter sensitivity) are assessed (Varis and Kuikka, 1999; Bednarski et al., 2004; Pollino et al., 2007).

30

Evidence sensitivity Evidence sensitivity measures the degree of variation in the BN’s posterior distribution resulting from changes in the evidence being entered in the network. Using these values, we are able to rank the evidence nodes accordingly to assist the expert in targeting future data collection and identifying any errors in the BN structure or CPTs (Pollino et al., 2007). Two popular ways in which to measure evidence sensitivity are entropy and mutual information (Pollino et al., 2007). Entropy, H(x), measures the randomness of a variable and is calculated as follows (Pearl, 1988; Korb and Nicholson, 2004; Pollino et al., 2007b): H ( X )   P( x) log P ( x)

(2.6)

where P(x) is the probability distribution of X We can interpret this value as the average additional information necessary to specify an alternative (Das, 2000, Pollino et al., 2007). The other measure of evidence sensitivity is mutual information I(X,Y), which gives an indication of the effect that one random variable, X, has on another variable, Y, and is calculated as follows (Korb and Nicholson, 2004; Pollino et al., 2007b):

I ( X ,Y )  H ( X )  H ( X / Y )

(2.7)

This value represents the extent to which the joint probability of X and Y differs from what it would have been if X and Y were independent (Korb and Nicholson, 2004). Consequently a value of 0 means that the variables are in fact mutually independent (Pearl, 1988). Parameter sensitivity It is useful to describe the sensitivity change in the posterior probability of the target query resulting from changes in the probability parameters of a BN with a mathematical sensitivity function (Wang et al., 2002). One way sensitivity analysis is done by varying one of the parameters while keeping all the others fixed and then measuring the variation in the output parameter (Bednarski et al., 2004). To do this a sensitivity function is required for the output probability in terms of the parameter, x, being varied. This sensitivity function is defined in Eq. (3) below and is the quotient of two linear functions in the parameter being varied (van der Gaag et al., 2007). f ( x) 

( x   ) ( x   )

(2.8)

where α, β, δ, and γ are constants built from the parameters which are fixed

31

The sensitivity value of the parameter x and the target probability can be obtained by taking the first derivative from the sensitivity (Laskey 1995; van der Gaag et al., 2007) and is given by the following equation:

f '( x ) 

(   ) ( x   )²

(2.9)

Using parameter sensitivity we can therefore identify those parameters which cause the biggest changes in the posterior probabilities of the outcome of interest. Efforts are then directed to improve the level of accuracy for those parameters (Pollino et al., 2007b) and to channel expert elicitation efforts (van der Gaag et al., 2007). Wang et al.(2002) defines a measurement of the importance of a parameter with respect to multiple queries and evidence scenarios and uses this to focus attention on refining their prior probabilities.

2.8. Review of BN Modelling Several alternative modelling approaches to BNs may be suitable depending on the context of the issue of interest. Some options are stochastic petri nets, decision trees, fault trees and reliability block diagrams (Wilson et al., 2006; Ahmed et al., 2009). In a decision tree, there is one node at the top of the tree and each subsequent level is split according to the decision which has been taken (Janssens et al., 2006). Unlike decision trees, BNs are able to represent interactions between factors. Furthermore, Janssens et al. (2006) found that BNs outperformed decision trees. A stochastic petri net (SPN), may be used to model concurrent systems (Angeli et al., 2007) and fault trees or reliability blocks, to model system reliability (Wilson et al., 2006). Implementing a SPN is not trivial and requires statistical knowledge as well as some familiarity with stochastic process theory and Monte Carlo simulation techniques (Goss and Peccoud, 1998). Uusitalo (2007) reviewed the advantages and disadvantages of using Bayesian networks in environmental modelling. She believes that BNs are likely to become established as a standard method of analysis in the environmental community when dealing with uncertainly. However she warns that BN software is still evolving in this field and new algorithms that have been defined and accepted in BN methodology may not yet be implemented in the software. BNs do not deal with continuous data very well, often requiring continuous variables to be discretized. Nonetheless she commends the BNs ability to engage stakeholders, researchers and decision makers. She comments about the unsuitability of BNs for complex domains, but it should be noted that OOBNs have largely overcome this constraint (Koller and Pfeffer, 1997). She concludes that they are an important tool for the environmental modeller, growing in popularity, and recommends that the environmental modeller becomes familiar with BNs. For more detail on this review, we refer the reader to Uusitalo (2007). More recent reviews have been conducted by Wilson et al. (2006) and Ahmed et al. (2009) which include comparisons of BNs with alternative statistical methods. The review by Uusitalo (2007) also contains a summary of several software applications 32

which can be used to design BN models. For the research conducted in this thesis, Hugin® was chosen as the preferred software tool due mainly to its OO capabilities which matches the aims of this thesis. As is evident in the literature review of BNs, they have been used extensively and successfully in many diverse areas. The benefits of this approach has more recently been realised by the ecological community with new research being published on the use of BNs in modelling environmental issues and systems.

2.9. Population Viability Analysis Models for population viability analysis can generally be divided into two groups: mathematical and computational models (Drake et al., 2008). Mathematical models are primarily used for analysing population dynamics, for example in the study by van Vuuren et al. (2005) of the lions in Kgalagadi Transfrontier Park on the border of South Africa and Botswana. Whereas computational models, used in the study of the viability of the Serengeti cheetah population (Kelly and Durant, 2000), deal with extinction risk assessment and are able to represent more complex models (Drake et al., 2008). These computational models are typically at the expense of the innate understanding of the key population drivers (Drake et al., 2008). As described above, Bayesian networks are able to represent complex model structures and interactions, while retaining the simplicity of model interpretation and understanding (Jensen and Nielsen, 2007; Johnson et al., 2009). In a study by Marcot et al. (2001) BNs were successfully used to evaluate fish and wildlife population viability for different land management practices. They were also used in landscape models for wolverines in the interior northwest of the United States of America (Rowland et al., 2003). BNs are conducive to the identification of key environmental variables and the specification of interactions between those variables (Marcot et al., 2001). They considered a BN as representing an expert’s mind map, able to incorporate both expert opinion and empirical data. Furthermore, if experts disagree on the structure and interactions, the different structures can be represented and evaluated using the available empirical data to compare their relative predictive accuracy (Marcot et al., 2001).

2.10. Cheetah Conservation Over the past century, cheetahs (Acinonyx jubatus) have undergone a drastic reduction in both global geographic range and population size (Marker et al., 2007). In southern Africa the two main free-ranging populations are in Namibia and Botswana where the majority of cheetahs are found outside protected areas (Purchase et al., 2007). Here cheetah conservation is focussed primarily on these free-ranging populations. With the steady increase in human population size, their livestock and valuable trophy species, there is also heightened potential for conflict between humans and cheetahs (Purchase et al., 2007). In neighbouring South Africa, the majority of cheetahs are found in fenced reserves (Marnewick et al., 2007). The management of this metapopulation of cheetahs in small fenced reserves is becoming a challenge requiring intensive management to prevent inbreeding and local 33

overpopulation and to ensure long term sustainability of this cheetah metapopulation (Johnson et al., unpublished).

2.11. Harmful Algal Blooms Although algal blooms have occurred worldwide for at least the last century (Schrope, 2008), it is widely recognised that the past decade has seen an increase in occurrence, severity, and extent of harmful algal blooms (HABs) (Heisler et al., 2008; Johnson et al., 2009). Their adverse impact on the environment and associated health and economic implications are also well known and documented (Slobodkin, 1953; Heisler et al., 2008; Schrope, 2008). Lyngbya majuscula, a blue-green cyanobacteria, is embedded in this larger context of harmful algal blooms. HABs include blooms of dinoflagellates (such as Karenia brevis and Gymnodinium breve) and Trichodesmium causing a discolouration in the sea known as a red tide (Wyatt and Horwood, 1973; Schrope, 2008). There is a wealth of literature available on HABs, with a good review of the biological characteristics of a red tide written by Le Fèvre (1986). More recently this has been complemented by a comprehensive review of red tides in the Gulf of Mexico by Walsh et al. (2006). The two schools of thought for the main cause of a red tide was either nutrient or hydrographic theory (Wyatt and Horwood, 1973; Le Fèvre, 1986). The role of nutrients in both the initiation and sustenance of toxic algal blooms is universally acknowledged (Slobodkin, 1953; Heisler et al., 2008), but the nature of this relationship is complex and all the underlying processes and interactions are still not clearly understood despite many years of research (Slobodkin, 1953; Heisler et al., 2008; Schrope, 2008). Furthermore, the temporal and spatial behaviour of blooms add complexity to model structure and definition (Wyatt and Horwood, 1973; Chen and Mynett, 2006). The identification of knowledge gaps is equally important, as is the continued effort to collect high quality data and monitor blooms (Le Fèvre, 1986; Chen and Mynett, 2006; Schrope, 2008). Model development to predict the likelihood of the initiation, growth, sustenance and demise of a bloom, and to better understand the underlying scientific characteristics, processes and interactions of HABs are vital for the prevention and management of these toxic algal blooms (Walsh et al., 2001; Heisler et al., 2008; Schrope, 2008). There is a myriad of models of HABs that have been proposed and a comprehensive review of the models has been written by Franks (1997). Franks (1997) grouped the types of models into four main categories: (1) Simple physics models, for example the minimum water mass size for phytoplankton population growth, proposed by Kierstead and Slobodkin (1953); (2) Aggregated models, for example the increase in motile algae and the decrease in non-motile algae for different grazing coefficients presented by Wyatt and Horwood (1973); (3) Multispecies models, for example phytoplankton diversity proposed by Kemp and Mitsch (1979); (4) Detailed physics models, for example the numerical simulation of the formation of a red tide presented by Yanagi et al. (1995). 34

More recent approaches include a mathematical model of the joint effects of several factors (Morozov and Petrovskii, 2000), coupled ecological and physical models to predict the location, growth and demise of blooms (Walsh et al., 2001), integrated numerical and fuzzy cellular automata modelling (Chen and Mynett, 2006) and a multispecies dynamic simulation model (Solé et al., 2006). Importantly, model outcomes confirmed the temporal nature of nutrient concentrations and algal blooms (Solé et al., 2006). Putting management actions in place, although based on incomplete knowledge, is prudent and conforms to the iterative nature of adaptive management practice (Nyberg et al., 2006; Henriksen and Barlebo, 2008). As new knowledge comes to light and is incorporated into the predictive models of HAB initiation (Franks, 1997; Walsh et al., 2001; Schrope, 2008), so management actions can be altered (Heisler et al., 2008) and models validated (Franks, 1997; Walsh et al., 2001; Schrope, 2008). This iterative nature of environmental management is well suited to the IBNDC approach to integrated BN model development described in this thesis.

2.12. References Ahmed B A, Matheny M E, Rice P L, Clarke J R and Ogunyemi O I 2009 A comparison of methods for assessing penetrating trauma on retrospective multi-center data Journal of Biomedical Informatics 42 308-316 Angeli D, De Leenheer P and Sontag E D 2007 A Petri net approach to the study of persistence in chemical reaction networks Mathematical Biosciences 210 598-618 Bashari H, Smith C and Bosch O J H 2009 Developing decision support tools for rangeland management by combining state and transition models and Bayesian belief networks Agricultural Systems 99 23-34 Basili V R, Briand L C and Melo W L 1996 A validation of object-oriented design metrics as quality indicators Ieee Transactions on Software Engineering 22 751-761 Borsuk, M.E., Reichert, P., Peter, A., Schager, E., Burkhardt-Holm, P., 2006. Assessing the decline of brown trout (Salmo trutta) in Swiss rivers using a Bayesian probability network. Ecological Modelling 192, 224244. Bromley J, Jackson N A, Clymer O J, Giacomello A M and Jensen F V 2005 The use of Hugin to develop Bayesian networks as an aid to integrated water resource planning Environmental Modelling & Software 20 23142 Castelletti, A., Soncini-Sessa, R., 2007. Bayesian networks and participatory modelling in water resource management. Environmental Modelling and Software 22, 1075-1088. Chen Q and Mynett A E 2006 Modelling algal blooms in the Dutch coastal waters by integrated numerical and fuzzy cellular automata approaches Ecological Modelling 199 73-81 35

Chidamber S R and Kemerer C F 1994 A Metrics suite for Object-Oriented Design Ieee Transactions on Software Engineering 20 476-493 Drake J M, Sven Erik J and Brian F 2008 Population Viability Analysis, Encyclopedia of Ecology, (Oxford: Academic Press) pp 2901-2907 Franks P J S 1997 Models of Harmful Algal Blooms Limnology and Oceanography 42 1273-1282 Ghahramani Z., 2001. An Introduction to Hidden Markov Models and Bayesian Networks. International Journal of Pattern Recognition and Artificial Intelligence 15, 9-42 Goss P J E and Peccoud J 1998 Quantitative Modeling of Stochastic Systems in Molecular Biology by Using Stochastic Petri Nets Proceedings of the National Academy of Sciences of the United States of America 95 6750-6755 Grady B 1994 Object-oriented analysis and design with applications, second edition (Redwood City, California: Benjamin/Cummings Pub. Co.) Harel D and Gery E 1997 Executable object modeling with statecharts Computer 30 31-42 Heisler J, Glibert P M, Burkholder J M, Anderson D M, Cochlan W, Dennison W C, Dortch Q, Gobler C J, Heil C A, Humphries E, Lewitus A, Magnien R, Marshall H G, Sellner K, Stockwell D A, Stoecker D K and Suddleson M 2008 Eutrophication and harmful algal blooms: A scientific consensus Harmful Algae 8 3-13 Henriksen H J and Barlebo H C 2008 Reflections on the use of Bayesian belief networks for adaptive management Journal of Environmental Management 88 1025-1036 Hugin 2007 Hugin. http://www.hugin.com, accessed on 23 October 2007 Janssens D, Wets G, Brijs T, Vanhoof K, Arentze T and Timmermans H 2006 Integrating Bayesian networks and decision trees in a sequential rulebased transportation model European Journal of Operational Research 175 16-34 Jensen F V and Nielsen T D 2007 Bayesian Networks and Decision Graphs: (New York: Springer Science + Business Media, LLC) Johnson S, Fielding F, Hamilton G and Mengersen K 2009 An Integrated Bayesian Network approach to Lyngbya majuscula bloom initiation Marine Environmental Research Kelly M J and Durant S M 2000 Viability of the Serengeti Cheetah Population Conservation Biology 14 Kemp W M and Mitsch W J 1979 Turbulence and phytoplankton diversity: A general model of the "paradox of the plankton" Ecological Modelling 7 201-222 Kierstead H and Slobodkin L B 1953 The size of water masses containing plankton blooms Journal of Marine Research 12 141-147

36

Kifer M, Lausen G and Wu J 1995 Logical-Foundations of Object-Oriented and Frame-Based Languages Journal of the Association for Computing Machinery 42 741-843 Koller D and Pfeffer A 1997 Object-Oriented Bayesian Networks. In: Thirteenth Annual Conference on Uncertainty in Artificial Intelligence (UAI-97), (Providence, Rhode Island pp 302-313 Lauritzen S L, Dawid A P, Larsen B N and Leimer H G 1990 Independence properties of directed Markov fields Networks 20 491-505 Le Fèvre J 1986 Aspects of the Biology of Frontal Systems Advances in Marine Biology 23 163-299 Lunn D J, Thomas A, Best N and Spiegelhalter D 2000 WinBUGS - A Bayesian modelling framework: Concepts, structure, and extensibility Statistics and Computing 10 325-337 Marcot B G, Holthausen R S, Raphael M G, Rowland M M and Wisdom M J 2001 Using Bayesian belief networks to evaluate fish and wildlife population viability under land management alternatives from an environmental impact statement Forest Ecology and Management 153 29-42 Marcot, B.G., Steventon, J.D., Sutherland, G.D., McCann, R.K., 2006. Guidelines for developing and updating Bayesian belief networks applied to ecological modeling and conservation. Canadian Journal of Forest Research 36, 3063-3074. Marker L, Dickman A, Wilkinson C, Schumann B and Fabiano E 2007 The Namibian Cheetah: Status Report CAT News, Special Edition 3 4-13 McCann R K, Marcot B G and Ellis R 2006 Bayesian belief networks: applications in ecology and natural resource management Canadian Journal of Forest Research 36 3053 Molina M, Fuentetaja R and Garrote L 2005 Hydrologic models for emergency decision support using Bayesian networks, Symbolic and Quantitative Approaches to Reasoning with Uncertainty, Proceedings Lecture Notes in Computer Science, (Berlin: SPRINGER-VERLAG BERLIN) pp 8899Murphy K P 2002 Dynamic Bayesian Networks: Representation, Inference and Learning. In: Graduate Division, (Berkeley: University of California) p 225 Monarchi D E and Puhr G I 1992 A Research Typology for Object-Oriented Analysis and Design Communications of the Acm 35 35-47 Morozov A Y and Petrovskii S V 2000 Mathematical Modeling of the Initial Stage of a "Red Tide" Accounting for the Joint Effect of Various Factors Oceanology 40 356-362 Neapolitan, R., 1990. Probabilistic Reasoning in Expert Systems: Theory and Algorithms. John Wiley, New York. Nelson M T, Humphrey W, Gursoy A, Dalke A, Kale L V, Skeel R D and Schulten K 1996 NAMD: A parallel, object oriented molecular dynamics

37

program International Journal of Supercomputer Applications and High Performance Computing 10 251-268 Nyberg J B, Marcot B G and Sulyma R 2006 Using Bayesian belief networks in adaptive management Canadian Journal of Forest Research-Revue Canadienne De Recherche Forestiere 36 3104-3116 Park M-H and Stenstrom M K 2008 Classifying environmentally significant urban land uses with satellite imagery Journal of Environmental Management 86 181-192 Pastor O, Gomez J, Insfran E and Pelechano V 2001 The OO-Method approach for information systems modeling: from object-oriented conceptual modeling to automated programming Information Systems 26 507-534 Pearl J 1988 Probabilistic Reasoning in Intelligent Systems (San Francisco, California: Morgan Kaufmann Publishers Inc) Pollino C A, White A K and Hart B T 2007 Examination of conflicts and improved strategies for the management of an endangered Eucalypt species using Bayesian networks Ecological Modelling 201 37-59 Purchase G K, Marker L, Marnewick K, Klein R and Williams S 2007 Regional assessment of the status, distribution and conservation needs of cheetahs (Acinonyx jubatus) in southern Africa: Summary of Country Reports CAT News, Special Edition 3 44-46 Riel A J 1996 Object-Oriented Design Heuristics (Reading, Mass: AddisonWesley Professional) Ross B. J. and Zuviria, E., 2007. Evolving dynamic Bayesian networks with multi-objective genetic algorithms. Applied Intelligence 26, 13-23. Rowland M M, Wisdom M J, Johnson D H, Wales B C, Copeland J P and Edelmann F B 2003 Evaluation of landscape models for wolverines in the interior northwest, United States of America Journal of Mammalogy 84 92-105 Saddo A, Letcher R A, Jakemana A J and Newham L T H 2005 A Bayesian decision network approach for assessing the ecological impacts of salinity management Mathematics and Computers in Simulation 69 162–76 Schrope M 2008 Oceanography: Red Tide rising Nature 452 24-26 Skillicorn D B and Talia D 1998 Models and languages for parallel computation Acm Computing Surveys 30 123-169 Slobodkin L B 1953 A possible initial condition for Red Tides on the Coast of Florida Journal of Marine Research 12 148-155 Smith, C.S., Howes, A.L., Price, B., McAlpine, C.A., 2007b. Using a Bayesian belief network to predict suitable habitat of an endangered mammal The Julia Creek dunnart (Sminthopsis douglasi). Biological Conservation 139, 333-347. Solé J, Estrada M and Garcia-Ladona E 2006 Biological control of harmful algal blooms: A modelling study Journal of Marine Systems 61 165-179 38

Steimann F 2000 On the representation of roles in object-oriented and conceptual modelling Data & Knowledge Engineering 35 83-106 Taroni F, Aitken C, Garbolino P and Biedermann A 2006 Bayesian Networks and Probabilistic Inference in Forensic Science (Chichester: John Wiley & Sons, Ltd) Uusitalo L 2007 Advantages and challenges of Bayesian networks in environmental modelling Ecological Modelling 203 312-8 van Vuuren J H, Herrmann E and Funston P J 2005 Lions in the Kgalagadi Transfrontier Park: modelling the effect of human-caused mortality International Federation of Operational Research Societies 12 145-171 Walsh J J, Jolliff J K, Darrow B P, Lenes J M, Milroy S P, Remsen A, Dieterle D A, Carder K L, Chen F R, Vargo G A, Weisberg R H, Fanning K A, Muller-Karger F E, Shinn E, Steidinger K A, Heil C A, Tomas C R, Prospero J S, Lee T N, Kirkpatrick G J, Whitledge T E, Stockwell D A, Villareal T A, Jochens A E and Bontempi P S 2006 Red tides in the Gulf of Mexico: Where, when, and why? - art. no. C11003 Journal of Geophysical Research-Oceans 111 11003-11003 Walsh J J, Penta B, Dieterle D A and Bissett W P 2001 Predictive Ecological Modeling of Harmful Algal Blooms Human and Ecological Risk Assessment 7 1369-1383 Weber P. and Jouffe, L., 2006. Complex system reliability modelling with Dynamic Object Oriented Bayesian Networks (DOOBN). Reliability Engineering & System Safety 91, 149-62. Wilson A G, Graves T L, Hamada M S and Reese C S 2006 Advances in Data Combination, Analysis and Collection for System Reliability Assessment Statistical Science 21 514–531 Wyatt T and Horwood J 1973 Model which Generates Red Tides Nature 244 238-240

39

Chapter 3 An Integrated Bayesian Network Approach to Lyngbya majuscula Bloom Initiation This chapter has been written as a journal article1 and is presented in its entirety. In this chapter we aim to demonstrate the exposition of Bayesian Network (BN) methodology to a complex ecological problem such as Lyngbya bloom initiation and illustrate how it can be used to integrate models for different aspects of the same issue. This is achieved by consolidating three disparate statistical models into an integrated Bayesian network (IBN). The management network quantifies a management action in the catchment area and then this action is simulated through a catchment model to obtain the modified nutrient loads. The BN which describes the scientific factors affecting the initiation of a Lyngbya bloom (Science BN) is then updated with the modified loads and recompiled to forecast the effect on the probability of a Lyngbya bloom initiation. The integrated model confirmed the temporal nature of Lyngbya and using an IBN approach in an Object Oriented framework we show how Object Oriented BNs (OOBN) and Dynamic OOBNs facilitate an integrated approach to modelling ecological issues of concern. Furthermore, the merger of multiple models which explore different aspects of the problem through an IBN approach can apply to many multi-faceted environmental problems.

1

The journal article has recently been accepted for publication in Marine Environmental Research:

An Integrated Bayesian Network Approach to Lyngbya majuscula Bloom Initiation Sandra Johnson*, Fiona Fielding, Grant Hamilton, Kerrie Mengersen *School of Mathematical Sciences, Queensland University of Technology, GPO Box 2434, Brisbane, QLD 4001, Australia

40

An Integrated Bayesian Network Approach to Lyngbya majuscula Bloom Initiation Abstract Blooms of the cyanobacteria Lyngbya majuscula have occurred for decades around the world. However, with the increase in size and frequency of these blooms, coupled with the toxicity of such algae and their increased biomass, they have become substantial environmental and health issues. It is therefore imperative to develop a better understanding of the scientific and management factors impacting on Lyngbya bloom initiation. This paper suggests an Integrated Bayesian Network (IBN) approach that facilitates the merger of the research being conducted by various parties on Lyngbya. Pivotal to this approach are two Bayesian networks modelling the management and scientific factors of bloom initiation. The research found that Bayesian networks (BN) and specifically Object Oriented BNs (OOBN) and Dynamic OOBNs facilitate an integrated approach to modelling ecological issues of concern. The merger of multiple models which explore different aspects of the problem through an IBN approach can apply to many multifaceted environmental problems. Keywords: Bayesian network, cyanobacteria, DOOBN, dynamic, IBN, Lyngbya majuscula, object oriented, OOBN

41

3.1

Introduction

Lyngbya majuscula is a cyanobacterium (blue-green algae) occurring naturally in tropical and subtropical coastal areas worldwide (Dennison et al., 1999; Osborne et al., 2001; Arquitt and Johnstone, 2004), including Moreton Bay in Queensland, Australia. Lyngbya grows on the sediment or over the seagrass, algae or coral (Dennison and Abal, 1999; Watkinson et al., 2005) and when the conditions are favourable, the algae goes through a rapid growth phase, resulting in a substantial increase in biomass, commonly referred to as a bloom (Ahern et al., 2007; Hamilton et al., 2007b). Lyngbya blooms appear to be increasing in both frequency and extent (Dennison and Abal, 1999; Albert et al., 2005; Ahern et al., 2007), which can have major ecological (Stielow and Ballantine, 2003; Paul et al., 2005; Watkinson et al., 2005), health (Osborne et al., 2001; Osborne et al., 2007) and economic consequences (Dennison and Abal, 1999). It is therefore imperative to better understand the scientific and management factors that drive the initiation of L. majuscula blooms. Deception Bay, located in Northern Moreton Bay in Queensland, Australia, has a history of Lyngbya blooms (Watkinson et al., 2005; Ahern et al., 2007) and forms a case study for this investigation. With its proximity to Brisbane, Australia’s third largest city with an estimated population in 2004 of 1.78 million (ABS, 2004), it is a popular tourist destination. The many waterways feeding from intensive and rural agricultural activities into the Bay and its use for commercial and recreational fishing, put pressure on the marine environment and compound the issues resulting from a nuisance algal bloom (Dennison and Abal, 1999). A modelling approach was required to identify the high priority research that needed to be undertaken into the poorly known features of Lyngbya initiation. Therefore it was necessary to capture and represent all the available data and expert knowledge about the initiation of Lyngbya blooms in Deception Bay. This approach had to engage stakeholders, represent the available information at different spatial and temporal scales, identify scientific and management factors affecting Lyngbya initiation and quantify the factors and their inter-dependencies. Moreover, the stakeholders were particularly diverse comprising ecologists and scientists familiar with Lyngbya and the factors that affect its bloom, state and local government representatives, committee members of local organisations, as well as individuals with an active interest in Lyngbya, including a third generation local fisherman with decades of accumulated knowledge of Lyngbya blooms in the Bay. There are several modelling approaches that could be considered for such a problem, including decision trees, stochastic petri nets and Bayesian networks. A decision tree has a “top-down approach”. The first factor (root node) at the top of the tree is split according to the decision taken. Each subsequent node is then split in a similar way (Janssens et al., 2006). This approach lacked the ability to represent the many interactions between the factors which would be needed to model the initiation of a Lyngbya bloom. A 42

stochastic petri net (SPN), also known as a place/transition net is used to model concurrent systems (Angeli et al., 2007). Implementing a SPN is not trivial, even with the use of bespoke software. It mandates some statistical knowledge as well as some familiarity with stochastic process theory and Monte Carlo simulation techniques (Goss and Peccoud, 1998). A Bayesian Network (BN) provides a graphical representation of key factors, which are represented as nodes in the diagram and their causal relationships with each other and with the outcome of interest (Borsuk et al., 2006; McCann et al., 2006; Jensen and Nielsen, 2007; Uusitalo, 2007) are depicted as directed links or arrows connecting a ‘parent node’ to a ‘child node’, resulting in a directed acyclic graph (DAG) (Saddo et al., 2005; Jensen and Nielsen, 2007; Uusitalo, 2007; Park and Stenstrom, 2008). BNs are better able to portray the complexity of the decision process and the many inter-dependencies between the factors of the decision process (Janssens et al., 2006). Moreover, they are visually appealing, easy to use, comprehend and interact with. For more detailed information about the advantages and disadvantages of BNs and comparisons with alternative statistical methods, we refer the reader to (Wilson et al., 2006; Uusitalo, 2007; Ahmed et al., 2009). Bayesian networks have been used successfully to better understand and model many complex environmental problems (Bromley et al., 2005). They facilitate the representation of different management decisions and scenarios that may impact on the environmental issue being modelled and the consequences of these situations and actions (McCann et al., 2006; Uusitalo, 2007). However, the focus of many networks is often on a single aspect of the outcome, and multi-faceted inferential needs are most commonly addressed through multiple independent networks. This paper describes an approach to integrating diverse knowledge about Lyngbya bloom initiation in the Deception Bay area, by developing an Integrated Bayesian Network (IBN). The IBN comprises a series of BNs designed to conceptualize and quantify the major factors and their pathways contributing to the initiation of Lyngbya, from both scientific and management perspectives. In Figure 3.1 a unified modelling language (UML) use case diagram illustrates the conceptual processes of the Lyngbya IBN. To our knowledge an IBN approach has not previously been applied to Lyngbya bloom initiation. In Section 3.2 we describe the characteristics of a traditional BN, an object oriented BN (OOBN) and the natural progression to a dynamic OOBN (DOOBN). We then introduce the integrated BN approach (IBN) which consolidates the information held in various networks and models. We present the results of this approach in Section 3.3 by applying it to the initiation of Lyngbya blooms.

43

Figure 3.1: UML use case diagram of the conceptual processes in the Lyngbya bloom initiation Integrated Network

3.2 Methods 3.2.1 Bayesian Network (BN) As described in Section 3.1, a BN visualizes knowledge about an ecological issue of interest with the important factors depicted as nodes in the network. These nodes may be at different temporal and spatial scales and the data represented in the BN may originate from diverse sources such as empirical data, expert opinion and simulation outputs (Saddo et al., 2005; Borsuk et al., 2006; McCann et al., 2006; Jensen and Nielsen, 2007; Pollino et al., 2007; Park and Stenstrom, 2008). In the case of the Lyngbya network, the outcome of interest is the initiation of a Lyngbya bloom. Each node of the network is described by a set of states (for example high/medium/low, adequate/inadequate) and quantified by associating a probability table with each node. The probability table is determined by these states and the states of the nodes that influence it. An example is the conditional probability table (CPT) for the Bottom Current Climate node, shown in Table 3.1, which has two states (Low and High) and has three parent nodes that influence it (Wind Direction, Wind Speed and Tide) (Saddo et al., 2005; Pollino et al., 2007; Park and Stenstrom, 2008).

44

Table 3.1: Conditional probability table for Bottom Current Climate node with states Low and High and parent nodes Wind Direction (states North, SE and Other), Wind Speed (states Low and High) and Tide (states Spring and Neap). These nodes, their states, probabilities and relationships are visible in the Bayesian network in figure 3.4 Wind Direction North North North North SE SE SE SE Other Other Other Other

Wind Speed Low Low High High Low Low High High Low Low High High

Tide

Low

High

Spring Neap Spring Neap Spring Neap Spring Neap Spring Neap Spring Neap

0.33 0.61 0.43 0.54 0.42 0.58 0.37 0.59 0.39 0.59 0.43 0.50

0.67 0.39 0.57 0.46 0.58 0.42 0.63 0.41 0.61 0.41 0.57 0.50

Two important characteristics of a BN which also simplify probability calculations are directional separation (d-separation) and the assumption of the Markov property (Jensen and Nielsen, 2007). The criterion for dseparation was first proposed by Pearl (1988) and an alternative criterion was specified by Lauritzen et al. (1990). If nodes are d-separated then they are conditionally independent (Kjaerulff, 1995; Taroni et al., 2006). The Markov property means that the probability distribution of a variable depends only on its parents. Consequently from the multiplication law of elementary probability theory, the conditional independence (d-separation) and the Markov property enable the probability distribution of a BN with n nodes  X1 ,...X n  to be factorized as follows: n

P  X 1 ,... X n    P  X i Pa  X i 

where Pa X i  is the set of parents of node X i

i 1

This greatly simplifies calculations of the joint probability distribution and allows us to focus on each node in turn to combine the expertise and data available for that node and its parents. The BNs described in this paper were developed as part of a larger study of the major factors and their pathways contributing to the initiation of Lyngbya blooms. They were constructed in close collaboration with a Lyngbya Science Working Group (LSWG) drawn from a range of disciplines and a Lyngbya Management Working Group (LMWG) drawn from local and state government and private organisations (Abal et al., 2005). The Science Network focused on nutrient and physical factors that were agreed by the LSWG to be the most influential contributors to the initiation of Lyngbya. To construct the Science Network, numerous meetings were convened to determine the most important factors that were believed to have 45

an impact on the ecosystem surrounding Deception Bay. Once the initial structure was agreed upon, the factors were then clearly defined. This was necessary to ensure throughout the process all involved could refer to these definitions to agree that this was indeed the focus of that particular aspect. The initial Lyngbya Science BN was then colour coded into six logical groups of coherent nodes. The groups are Water (containing nodes Rain-present, No prev dry days, Groundwater Amount and Land Run-off Load), Sea Water (containing Tide, Turbidity and Bottom Current Climate), Air (containing nodes Wind and Wind Speed), Light (containing nodes Light Quantity, Light Quality and Light Climate), Nutrients (containing nodes Dissolved Fe Concentration, Dissolved Organics, Dissolved N Concentration, Dissolved P Concentration, Particulates (Nutrients), Sediments Nutrient Climate, Point Sources and Available Nutrient Pool), and Lyngbya Algae (containing only the target node Bloom Initiation). Thereafter the network was quantified by populating a conditional probability table for each node, based on the factors affecting that node. For example, the probability of low or high Bottom Current Climate was determined for different states of its parent nodes, Wind Direction (north, south-east or other), Wind Speed (low or high) and Tide (spring or neap), as shown in Table 3.1. The CPTs were populated in this way using data obtained from expert elicitation, output from simulation models and statistical models and data obtained from monitoring sites and government agencies. The datasets spanned different time periods ranging from one season to several years, depending on availability and applicability. Meta-data on these datasets were compiled as a key component of the project. The meta-database, comprising the source, ownership, type of data and dates collected, is summarised in a Healthy Waterways report (Fielding et al., 2007), and is available on the organisation’s website. Validation of the BN was assessed in three ways: through sensitivity analysis, outcomes comparison and scenario testing. Sensitivity analysis is a popular technique in mathematical modelling and the field of decision theory to investigate uncertainty in a model’s parameters and their effect on the model output (Hamby, 1994; Coupe et al., 2000). For BNs this means studying the changes in the probabilities of the target node as a result of changes in the network’s CPT values (Coupe et al., 2000). In the Science BN the probabilities of one node was varied, while keeping the others fixed, and then observing the changes in the probability of a Lyngbya bloom initiation. Sensitivity analysis is considered crucial to model validation and for targeting further research (Hamby, 1994). It is performed on the BN model to reduce uncertainty in the target node and to identify those nodes that have the largest impact on the target node (Hamby, 1994; Coupe et al., 2000). Additional research effort can then be directed to the quantification of those nodes (Bednarski et al., 2004). Outcomes comparison involves comparison between external data and model predictions. In the case of the Lyngyba Science BN, no such external data were available since all known available data had been used to populate aspects of the BN. Moreover, for any observed Lyngbya outbreak, data were 46

not available for the complete set of nodes in the BN model. As a result, a more limited outcomes comparison was undertaken through scenario testing, in which selected scenarios reflected known conditions associated with documented initiation or lack of initiation of Lyngbya outbreaks in the last 30 years. Scenario testing is important to investigate model behaviour for different expert defined scenarios, assessing whether the model behaves as expected in light of past experience and in accordance with current credible research (Laskey, 1995; Bednarski et al., 2004). The expert team therefore nominated scenarios of interest and evidence was entered into the BN to represent these scenarios. The relevant nodes were updated to reflect the proposed scenario and this evidence was propagated through the BN to update the probability of a Lyngbya bloom initiation under those conditions (Laskey, 1995; Bednarski et al., 2004). For example, evidence of ‘best practice’ was entered into the BN by setting the Point sources node to low. Further sensitivity analysis was then performed on the other nodes in the BN to observe the sensitivity of the target node to changes in node probabilities for that scenario. The Management Network focused on management inputs that potentially influence the delivery of nutrients to the Bay and was constructed through a series of meetings with the LMWG. The potential nutrient sources that were identified by the LMWG were split into point sources (coming from a relatively concentrated area e.g. waste water treatment plants) and diffuse sources (nutrients being contributed to the water catchment from a larger geographical area e.g. grazing land). The management model has evolved into a graphical representation of the catchment area showing the waterways and identifying the location of the sources within the catchment as well as their nutrient contributions. Participants of the LMWG identified the existing, committed and best practice management options for each source. The network was then quantified by assigning probabilities to each node to reflect the probability of low or high nutrient discharge for each source under each management option.

3.2.2 Object Oriented Bayesian Network (OOBN) The basic building block in an Object Oriented Bayesian Network (OOBN) is an object, which can be a physical or an abstract entity, or a relationship between two entities. Typically, an entity comprises one or more nodes in a BN that are related in a physical, functional or abstract sense. From a probabilistic point of view, the attributes (nodes and links) are encapsulated in an object and therefore d-separated from the rest of the network. The definition of classes of objects in OOBNs enables a more generic, reusable network to be described, which can then be used in different contexts. A class is a generic network fragment and when this class is instantiated it is called an object. A class may be instantiated many times (Jensen and Nielsen, 2007). It is not uncommon for several classes to share common substructures. These subclasses can inherit many attributes and behaviours from the parent class, which they can then modify and enhance. 47

The parent class can be viewed as being more abstract than its subclasses with only the important details being retained, whereas subclasses define more specific attributes and behaviours. The ability to create subclasses that inherit properties from another class is a well known and very useful characteristic of object oriented modelling (Koller and Pfeffer, 1997). Applying this object oriented approach to BN modelling; an OOBN can be instantiated within another OOBN. An instantiated OOBN is called an instance node and represents an instance of another network, which in turn could contain instance nodes. Connectivity between these OOBNs is achieved through interface nodes (input nodes and/or output nodes) (Hugin, 2007; Jensen and Nielsen, 2007). It is clear that OOBNs enable a more structured, hierarchical approach to modelling and consequently the construction of complex and dynamic models (Koller and Pfeffer, 1997; Hepler and Weir, 2008). The groups of nodes defined in Section 3.2.1 for the Science Network formed the basis for the creation of OOBN sub-networks, for example the ‘Dissolved Elements subnet’ and ‘Light subnet’, respectively. Thereafter the interface nodes were identified and added to the sub-networks to facilitate the transfer of information and evidence into and out of the sub-nets. The OOBN subnetworks were then linked via the interface nodes to recreate the Lyngbya Science network. The new structure now facilitated the independent parallel development and interrogation of the sub-networks so that they could be reintegrated into the parent network when they were deemed to be complete.

3.2.3 Dynamic Object Oriented Bayesian Network (OOBN) The temporal behaviour of a network can be represented by time slices, one for each period of interest. The resulting network, consisting of several OOBN time slices, is referred to as a dynamic OOBN (DOOBN) (Kjaerulff, 1995; Weber and Jouffe, 2006). Lyngbya blooms in Moreton Bay occur more frequently during the summer months when conditions are more favourable for bloom initiation (Watkinson et al., 2005). Additional statistical modelling was conducted by Hamilton et al. (2007a) on the effects of temperature, rainfall and light on L .majuscula blooms and the importance of groundwater in stimulating Lyngbya blooms has been studied by Ahern et al. (2006) and was nominated by LSWG as a key node that may exhibit temporal behaviour. It was thus considered that the DOOBN would be better able to predict the probability of Lyngbya bloom initiation. A UML use case diagram illustrating the processes involved in creating this DOOBN, is shown in figure 3.2. The initial static Science BN model used annual averages for rainfall and temperature, but captured some temporal behaviour by introducing a node to represent the previous number of dry days. As directed by the LSWG the Lyngbya Science BN was extended to incorporate the temporal nature of L. majuscula to create a DOOBN with five time slices (one for each of the months of November to March).

48

Figure 3.2: UML use case diagram of the processes for the Lyngbya bloom initiation DOOBN

The DOOBN is therefore able to predict the probability of a Lyngbya bloom initiation by incorporating specific monthly data while also taking into account the influence of the previous month. Using Bayesian statistical modelling Hamilton et al. (2007a) investigated the response of Lyngbya bloom initiation to temporal factors such as average minimum and maximum monthly temperature, monthly rainfall, average monthly solar exposure and average monthly clear sky (the inverse of cloud cover). One month time lags and interaction terms were also included for rain and minimum temperature. From a total of 890 models evaluated, the single term average minimum monthly temperature model (with an intercept term) had the best predictive behaviour. Rainfall at a lag of one month was the only other variable that appeared in the top five identified models.

3.2.4 Integrated Bayesian Network (IBN) We describe here the IBN for the probability of initiation of a Lyngbya bloom. This network comprises two primary BNs, the Management Network and the Science Network described in Section 3.2.1, integrated with a Water Catchment simulation model, which was concurrently developed under the Lyngbya Programme. The IBN is conceived as a series of steps, in which the Management Network informs about nutrient discharge into the Deception Bay catchment, the Catchment model simulates the movement of these nutrients to the Lyngbya site in the Bay, and the Science model then integrates this nutrient information with other factors to determine the probability of initiation of a Lyngbya bloom. Figure 3.3 is a UML activity diagram detailing the processes of the IBN for Lyngbya bloom initiation. In addition to providing a rich, cohesive model of 49

Lyngbya bloom initiation from both a science and management perspectives, an important use of the IBN was for scenario modelling. A set of exemplar scenarios that could impact on nutrient delivery to the Lyngbya site was proposed. This included: upgrading point sources from existing to best practice (e.g. eliminating potassium output from sewage treatment plants across the catchment), describing a climate event (e.g. a severe summer storm), and conditions least favourable for bloom initiation (e.g. low temperature and nutrients).

Figure 3.3: UML activity diagram detailing the processes for the Lyngbya bloom initiation IBN

For each proposed scenario the changes in the level of nutrients or to the factors affecting the initiation of Lyngbya in the Science network were assessed. If nutrient loads were changed, the impact on nutrient concentrations across the catchment arising from a management scenario could then be simulated through the Water Catchment model by the application of filters. The E2 software package (eWater CRC, 2007) used to create the Water Catchment model contains several pre-defined filters capable of simulating various complex management actions and adjusting the catchment load output accordingly. For example filters such as percentage removal of a nutrient and nutrient trapping may be chosen. Thereafter the Science Network was updated to reflect the modified nutrient loads and other changes related to the proposed scenario. This evidence was then propagated through the network to yield the probability of initiation of a Lyngbya bloom under the specific scenario.

50

The networks in the IBN were developed using a variety of software modelling tools. The conceptual Management Network was visually represented using the BN package Netica® (Norsys, 2007) and then interfaced with the hydrological flow and nutrient load model created in the whole of catchment simulation software package, E2 (eWater CRC, 2007), in order to identify nutrient loads reaching the Lyngbya site. The Science Network was developed entirely in Netica® and later in Hugin® (Hugin, 2007) where the network was transformed into a DOOBN by creating time slices (Kjaerulff, 1995; Weber and Jouffe, 2006; Jensen and Nielsen, 2007). In summary, the novelty factor here is that although a static BN is unable to ‘communicate’ with another BN, we can transform it to an OOBN to facilitate information flow and linkage to other OOBNs of interest. Thus we can exploit the purpose for which each model was designed to build a more comprehensive model of the environmental issue of concern.

3.3

Results

The static Science BN for initiation of Lyngbya is depicted in figure 3.4 with the nodes representing the factors identified by the LSWG as important in the initiation of a Lyngbya bloom.

Figure 3.4: Science Network for Lyngbya initiation (Netica®) 51

Sensitivity analysis of this BN revealed that the seven most influential factors in the Science Network were (in decreasing order of influence): available nutrient pool (dissolved), bottom current climate, sediment nutrients, dissolved iron (Fe), dissolved phosphorous (P), light and temperature. Furthermore scenario modelling consistently identified available nutrient pool as the factor which most heavily influences the probability of initiation of a bloom. Point and diffuse sources deliver nutrients to the bay and this nutrient delivery is affected by management actions at the sources. The Science BN was also interrogated using management and climatic scenarios and analysing the effect on the probability of bloom initiation to changes in the various factors. The predicted changes in the probability of a Lyngbya bloom initiation as a result of each of the seven most influential factors in isolation, is shown in Table 3.2. In a ‘typical’ year, as defined by the LMWG, the probability of a bloom initiation was reported as 28%; this increased significantly during a severe summer storm event to 42%, when light climate was optimal and rain-present was high. Bloom initiation was a predicted as a certainty (100%) when the available nutrient pool (dissolved) was enough, temperature was high and light climate was optimal. However, when only the available nutrient pool (dissolved) was set to ‘not enough’, the probability of a bloom initiation dropped to 3%, but jumped to 80% when it was changed to ‘enough’. Although bottom current climate was a key influential factor, changing only this factor caused the probability of bloom initiation to drop to 15% when the bottom current climate was ‘high’ and to increase to 43% when it was ‘low’. This is a variation of 28% in the probability of a bloom initiation and although large, is clearly overshadowed by the 77% variation caused by changes in nutrient availability. Changing iron availability alone increased the probability of a bloom initiation from 21% to 37%. Changing organics availability alone increased the probability of a bloom initiation from 25% to 31%. Next the Science OOBN sub-network (figure 3.5) was created from the static Lyngbya Science BN as outlined in Section 2.2, retaining all the key factors Table 3.2: Changes to the probability of Lyngbya bloom initiation for key factors. All possible states for each of the nodes were assessed individually to ascertain the delta effect it had on the probability of a Lyngbya bloom initiation. Change in P(Bloom) (%)

Factor Available Nutrient Pool

77

Bottom Current Climate

28

Sediment Nutrient Climate

17

Dissolved Fe

16

Dissolved P

15

Light Climate

14

Temperature

14

52

(with the exception of the No of prev dry days) and their CPTs from the static BN. As is characteristic of Object Oriented networks, the Science OOBN subnetwork includes instances of other sub-networks, shown in figure 3.3 as rectangles with rounded edges, such as the Wind subnet and the Turbidity subnet. Input nodes were added to the Science OOBN sub-network as placeholders for the real nodes, Temperature, Rain- present, Land Run-off Load and Ground Water Amount. The sub-networks were based on the groups created in the static Science BN to yield standalone networks capable of linking to other networks via the interface nodes (input and output nodes), or being instantiated in other networks. Importantly, providing the interface remains intact, these OOBN sub-networks can be further expanded without affecting the structure of any other networks linking to it. As a consequence

Figure 3.5: Science OOBN sub-network we have a powerful concept of parallel development by independent expert teams while retaining the overall cohesive model. In collaboration with the LSWG and based on the findings of Hamilton et al. (2007a) as described in Section 3.2.3, the static Lyngbya Science network was adapted in the following manner to incorporate monthly rainfall and temperature data and the lag effect of rainfall on the amount of groundwater and land run-off. First, the lag effect of rainfall on groundwater amount and land run-off was replicated by creating a Rainwater OOBN sub-network as

Figure 3.6: Rainwater OOBN sub-network showing two output nodes, Groundwater Amount and Land Run-off53 Load, which are then connected to the input nodes Prev Groundwater and Prev Land Run-off in the next time slice

shown in figure 3.6. In this OOBN, the Prev Groundwater and the Prev Land Run-off are input nodes (double edged eclipse with a broken outer line), which enable connectivity to the previous time slice’s Ground Water Amount and Land Run-off nodes, respectively. The Rain – present input node enables the instances of the Rainwater OOBN to be bound to the rainfall relating to that instance, e.g. the November Rainwater OOBN instance will have November’s rainfall bound to the Rain – present input node. The Ground Water Amount and Land Run-off Load output nodes (double edged eclipse with a solid outer line) make them visible to other networks and therefore allow them to be bound to input nodes in other networks. Finally the DOOBN was created with five time slices (figure 3.7), one time slice for each of the summer months (December to February), one for the end of spring (November) and one for the start of autumn (March) . Every time slice has an instance of the Rainwater and Science sub-networks as well as the temperature and rainfall nodes for that month. Data from the Bureau of Meteorology was used to quantify the DOOBN, as well as the information contained in the initial static BN.

Figure 3.7: Five time slices forming the DOOBN for Lyngbya bloom initiation

54

As can be seen in figure 3.8, the rainfall information for a particular month is bound to the Rain present input node in the Rainwater and Science model sub-network instances for that month and the Groundwater Amount and Land Run-off output nodes from one month bind to the Prev Groundwater and Prev

Figure 3.8: Expanded sub-network instances in Hugin®, showing the interface nodes for each instance. The input and output nodes are represented here as ellipses with broken and solid lines, respectively. Also evident are the directed links between the sub-network instances of the same and the next time slice, so that information from one time slice can flow into the next time slice.

Land Run-off input nodes of the following month, respectively. The point and diffuse nutrient sources contributing to the Management Network for Lyngbya initiation included: aquaculture, composting, onsite sewage, poultry, waste disposal, waste water treatment plant, agriculture, artificial development, development and clearing, extractive industries, forestry, grazing, natural vegetation and stormwater. The sources and nutrients identified by the management committee are shown in Table 3.3.

55

Table 3.3: Point and diffuse sources contributing nutrients to Deception Bay Source

Point(P) or Aquaculture P Composting P Onsite Sewage P Poultry P Waste Disposal P Waste Water Treatment P Agriculture D Artificial Development D Developing & Clearing D Extractive D Forestry D Grazing D Natural Vegetation D Stormwater D

Nitrogen Phosphor ous X X X X X X X X X X X X X X X X X X X X X X X X X X X X

Iron

Organi cs

X X X

X

X

X

X

X X

An extract of the Management Network, which identifies and locates point and diffuse sources of nutrients for the Mellum Creek Sub-catchment, visually represented in Netica®, is shown in figure 3.9.

Figure 3.9: Extract of the Management Network for Mellum Creek Sub-catchment, a visual representation of the sub-catchment, showing point and diffuse sources of nutrients. The inset shows the complete Management Network

56

Scenario modelling predicted higher probabilities of Lyngbya bloom initiation during the summer months and confirmed the temporal nature of Lyngbya bloom initiation. Incorporating this behaviour resulted in the DOOBN for Lyngbya bloom initiation (figure 3.7) being developed as outlined above with one month lag effects included for groundwater amount and land run-off. As shown in figure 3.10 below, the BN predicts a sharp increase in the probability of initiation of a Lyngbya bloom from the end of spring (November) to the first month of summer (December). The increased probability continued during the next two summer months, with a slight fall in autumn (March). Although these predicted probabilities for Lyngbya bloom initiation are low, the increased trend in bloom initiation is clearly visible. When evidence of summer rainfall was added to these time slices we observed a more dramatic increase. For example, the probability of a bloom initiation was predicted as 52% when evidence of a summer rainfall event was entered into the December time slice. This compares to 42% in the original static annual BN model.

Figure 3.10: Probability of Lyngbya bloom initiation

3.4

Discussion

This paper describes an Integrated Bayesian Network approach applied to the initiation of Lyngbya blooms. The aim was to present the exposition of BN methodology to a complex ecological problem such as Lyngbya bloom initiation and illustrate how it can be used to integrate models for different aspects of the same issue. We have illustrated the process that could be followed to integrate two static BNs and another type of model (such as the E2 model of the Whole of Catchment) to achieve an integrated BN. The IBN approach described here can also be used for investigating other features of this organism, such as growth, biomass and decay, through appropriate changes to the Science Network. These networks are currently being developed. The Integrated Network approach is also conceptually suitable for investigating other outcomes of interest that are impacted by nutrient outputs and water movement in a catchment. It is noted that it is beyond the scope of the present paper to provide an actual test of the utility of BNs for predicting cyanobacterial blooms. The paper therefore does not include a comparison of the predictions against classical 57

multivariate techniques; a test of the BNs own output reliability, that is, whether the probabilistic estimate of the likelihood the BNs output is correct for the target data set; a clear presentation of exactly what data are being used; a sufficient amount of data to first build and refine the model on one data set and then test it on a previously unseen set of data. However the Science BN, which has been adopted by Healthy Waterways, will be validated through future data collected as part of the next phase of the Lyngbya project. More broadly, the general approach proposed in this paper is applicable to environmental or other outcomes involving both scientific and management considerations. Information arising from expert knowledge, data and research can be formally conceptualized and quantified through Science and Management Networks, and combined into an Integrated Network. Such an approach involves definition of the problem or outcome of interest, agreement as to significant contributing factors and their definitions and pathways which impact on this outcome, and identification and integration of information that allow quantification of these factors and impacts. The benefits of such an approach include a much greater specification of the issue at hand or research focus, buy-in from diverse stakeholders, consolidation and formalisation of information, an audit trail for decision-making and future research, and quantitative outcomes in the form of probability statements about the outcome of interest. In the static Lyngbya BN (figure 3.4) similar factors were grouped together and colour coded as a visual aid. The nature of the Science BN enabled a simple conversion of the network to an OOBN (figure 3.6), with a sub-network for each group of factors and interface nodes providing the communication links to other OOBNs (figure 3.8). In the same way many complex BNs can be simplified by abstracting the network to a higher level to include sub-networks of logically grouped factors, which in turn can include other sub-networks, thereby having several levels of abstraction. An important feature of the OOBN sub-networks is that they can be developed simultaneously by the various expert groups who are responsible for them. When the sub-networks have been quantified, tested and ratified, they are integrated into the master network containing instances of those sub-networks. The extension to a DOOBN not only improved prediction but also enhanced interpretability of the network. The inclusion of time-specific dynamics for temperature and water was more consistent with the conceptual framework of Lyngbya behaviour held by both science and management stakeholders. Moreover, it is more straightforward to include expert opinion and data of a temporal nature in this expanded model. It is suggested that for other complex ecological systems, the additional complexity of a DOOBN is more than compensated for by the increased flexibility of representation of information and acceptability of the outputs. Finally, the creation of an IBN to combine multiple networks which describe different aspects of an outcome of interest is an effective way of providing a cohesive, quantifiable and auditable tool for better understanding and coordination of multi-faceted environmental problems. 58

Acknowledgements Financial assistance was provided by the Environmental Protection Agency and Australian Government through the South East Queensland Healthy Waterways Partnership, the ARC Centre for Dynamic Systems and Control, and QUT Institute for Sustainable Resources. We fully acknowledge the contributions of the Lyngbya Management Working Group and the Lyngbya Science Working Group. For helpful comments on the manuscript we acknowledge Kathleen Ahern, Barry Hart and an anonymous reviewer.

3.5

References

Abal E G, Greenfield P F, Bunn S E and Tarte D M 2005 Healthy Waterways: Healthy Catchments – An Integrated Research/Management Program to Understand and Reduce Impacts of Sediments and Nutrients on Waterways in Queensland, Australia, Frontiers of WWW Research and Development - APWeb 2006, (Harbin, China: Springer Berlin / Heidelberg) pp 1126-35 ABS 2004 3222.0 - Population Projections, Australia, 2004 to 2101. (Canberra: Australian Bureau of Statistics) Ahern K S, Ahern C R, Savige G M and Udy J W 2007 Mapping the distribution, biomass and tissue nutrient levels of a marine benthic cyanobacteria bloom (Lyngbya majuscula) Marine and Freshwater Research 58 883-904 Ahern K S, Udy J W and Pointon S M 2006 Investigating the potential for groundwater from different vegetation, soil and landuses to stimulate blooms of the cyanobacterium, Lyngbya majuscula, in coastal waters Marine and Freshwater Research 57 177-86 Ahmed B A, Matheny M E, Rice P L, Clarke J R and Ogunyemi O I 2009 A comparison of methods for assessing penetrating trauma on retrospective multi-center data Journal of Biomedical Informatics 42 308-16 Albert S, O’Neil J M, Udy J W, Ahern K S, O'Sullivan C M and Dennison W C 2005 Blooms of the cyanobacterium Lyngbya majuscula in coastal Queensland, Australia: disparate sites, common factors Marine Pollution Bulletin 51 428-37 Angeli D, De Leenheer P and Sontag E D 2007 A Petri net approach to the study of persistence in chemical reaction networks Mathematical Biosciences 210 598-618 Arquitt S and Johnstone R 2004 A scoping and consensus building model of a toxic blue-green algae bloom System Dynamics Review 20 179-98 Borsuk M E, Reichert P, Peter A, Schager E and Burkhardt-Holm P 2006 Assessing the decline of brown trout (Salmo trutta) in Swiss rivers using a Bayesian probability network Ecological Modelling 192 224-44 Bromley J, Jackson N A, Clymer O J, Giacomello A M and Jensen F V 2005 The use of Hugin to develop Bayesian networks as an aid to integrated water resource planning Environmental Modelling & Software 20 23142 59

Dennison W C and Abal E G 1999 Moreton Bay Study: A Scientific Basis for the Healthy Waterways Campaign (Brisbane: South East Queensland Regional Water Quality Management) Dennison W C, O’Neil J M, Duffy E J, Oliver P E and Shaw G R 1999 Blooms of the cyanobacterium Lyngbya majuscula in coastal waters of Queensland, Australia Bulletin de l’Institut Oceanographique, Monaco 19 501-6 eWater CRC 2007 E2 Catchment Modelling Toolkit, http://www.toolkit.net.au/Tools/E2, accessed on 15 April 2009 Fielding F, Alston C, Dwyer M, Hamilton G, Johnson S, McVinish R, Peterson N and Mengersen K 2007 LYNGBYA Task 2.3: Development of an Integrating Framework for the Lyngbya Research and Management Program 2005-2007 Bayesian Belief Networks. (Brisbane, Australia: Healthy Waterways Partnership) pp 1-39 Goss P J E and Peccoud J 1998 Quantitative Modeling of Stochastic Systems in Molecular Biology by Using Stochastic Petri Nets Proceedings of the National Academy of Sciences of the United States of America 95 6750-5 Hamilton G, McVinish R and Mengersen K 2007a Bayesian model identification and averaging for coastal algal bloom prediction. Hamilton G S, Fielding F, Chiffings A W, Hart B T, Johnstone R W and Mengersen K 2007b Investigating the Use of a Bayesian Network to Model the Risk of Lyngbya majuscula Bloom Initiation in Deception Bay, Queensland Human and Ecological Risk Assessment 13 1271-9 Hepler A B and Weir B S 2008 Object-oriented Bayesian networks for paternity cases with allelic dependencies Forensic Science International: Genetics 2 166-75 Hugin 2007 Hugin. http://www.hugin.com, accessed on 23 October 2007 Janssens D, Wets G, Brijs T, Vanhoof K, Arentze T and Timmermans H 2006 Integrating Bayesian networks and decision trees in a sequential rulebased transportation model Eur. J. Oper. Res. 175 16-34 Jensen F V and Nielsen T D 2007 Bayesian Networks and Decision Graphs (New York: Springer Verlag) Kjaerulff U 1995 dHugin - a computational system for dynamic time-sliced Bayesian networks International Journal of Forecasting 11 89-111 Koller D and Pfeffer A 1997 Object-Oriented Bayesian Networks. In: Thirteenth Annual Conference on Uncertainty in Artificial Intelligence (UAI-97), (Providence, Rhode Island) pp 302-13 Lauritzen S L, Dawid A P, Larsen B N and Leimer H G 1990 Independence properties of directed Markov fields Networks 20 491-505 McCann R K, Marcot B G and Ellis R 2006 Bayesian belief networks: applications in ecology and natural resource management1 Canadian Journal of Forest Research 36 3053 Norsys 2007 Netica. http://www.norsys.com, accessed on 16 November 2007 60

Osborne N J, Shaw G R and Webb P M 2007 Health effects of recreational exposure to Moreton Bay, Australia waters during a Lyngbya majuscula bloom Environment International 33 309-14 Osborne N J T, Webb P M and Shaw G R 2001 The toxins of Lyngbya majuscula and their human and ecological health effects Environment International 27 381-92 Park M-H and Stenstrom M K 2008 Classifying environmentally significant urban land uses with satellite imagery Journal of Environmental Management 86 181-92 Paul V J, Thacker R W, Banks K and Stjepko G 2005 Benthic cyanobacterial bloom impacts the reefs of South (Broward County, USA) Coral Reefs 24 693-7 Pearl J 1988 Probabilistic Reasoning in Intelligent Systems (San Francisco, California: Morgan Kaufmann Publishers Inc) Pollino C A, White A K and Hart B T 2007 Examination of conflicts and improved strategies for the management of an endangered Eucalypt species using Bayesian networks Ecological Modelling 201 37-59 Saddo A, Letcher R A, Jakemana A J and Newham L T H 2005 A Bayesian decision network approach for assessing the ecological impacts of salinity management Mathematics and Computers in Simulation 69 162–76 Smith C S, Howes A L, Price B and McAlpine C A 2007 Using a Bayesian belief network to predict suitable habitat of an endangered mammal The Julia Creek dunnart (Sminthopsis douglasi) Biological Conservation 139 333-47 Stielow S and Ballantine D L 2003 Benthic cyanobacterial, Micro-coleus lyngbyaceus, blooms in shallow, inshore Puerto Rican seagrass habitats, Caribbean Sea Harmful Algae 2 127-33 Taroni F, Aitken C, Garbolino P and Biedermann A 2006 Bayesian Networks and Probabilistic Inference in Forensic Science (Chichester, England: John Wiley & Sons, Ltd) Uusitalo L 2007 Advantages and challenges of Bayesian networks in environmental modelling Ecological Modelling 203 312-8 Watkinson A J, O'Neil J M and Dennison W C 2005 Ecophysiology of the marine cyanobacterium, Lyngbya majuscula (Oscillatoriaceae) in Moreton Bay, Australia Harmful Algae 4 697-715 Weber P and Jouffe L 2006 Complex system reliability modelling with Dynamic Object Oriented Bayesian Networks (DOOBN) Reliability Engineering & System Safety 91 149-62 Wilson A G, Graves T L, Hamada M S and Reese C S 2006 Advances in Data Combination, Analysis and Collection for System Reliability Assessment Statistical Science. 21 514–31

61

Chapter 4 Integrating Bayesian Networks and a GIS-based Nutrient Hazard map of Lyngbya majuscula This chapter has been written as a conference paper2 and is presented in the format dictated by the conference organisers. Due to the restriction in length of a conference paper, an Appendix containing supplementary diagrams are included as part of this chapter. We describe a flexible process to integrate three models which were created during the Lyngbya Research and Management Program (2005-2007). Two models, the science and management networks, are mentioned in Chapter 3 and the third model is a GIS based model. The latter is a Nutrient Hazard Map and was produced by co-authors Mr Shane Pointon, Mr Chris Vowles and Mr Col Ahern to give an indication of an area’s potential to export nutrients of concern to coastal waters. We demonstrate the integration process by an example scenario where the outcome of this scenario is expressed as a probability of Lyngbya bloom initiation. The integrated model provides a better understanding of the impact of changes in management practices in the catchment area on Lyngbya bloom initiation and can be automated for use as a management tool to assess projected impact on Lyngbya as a result of a proposed change in land use in the catchment area.

2

The paper, for which I am first author, was refereed and accepted for the Queensland Coastal Conference 2009, Waves of Change, held at the Gold Coast on 12-15 May 2009. An extended version of this paper is in preparation for submission to Human and Ecological Risk Assessment. The diagrams being included in the journal are shown in the Appendix. Integrating Bayesian Networks and a GIS-based Nutrient Hazard map of Lyngbya majuscula 1

1

2

2

2

Sandra Johnson , Kerrie Mengersen , Col Ahern , Shane Pointon , Chris Vowles , Kathleen Ahern 1 Queensland University of Technology GPO Box 2434 Brisbane QLD 4001 2 Natural Resources and Sciences, NRW, Indooroopilly QLD 4068 3 Environmental Protection Agency PO Box 15155 City East QLD 4002

62

3

Integrating Bayesian Networks and a GIS-based Nutrient Hazard map of Lyngbya majuscula 4.1 Introduction There is an abundance of expert modelling software available to the environmental scientist and manager. Consequently there are a multitude of models and simulation outputs focussing on a particular aspect of an ecological issue of concern. The integration of the results and the consolidation of knowledge captured in these disparate modelling software systems is a common problem facing the environmental worker. Here we look at three models of the harmful algal bloom, Lyngbya majuscula. Two Bayesian networks were created to model the scientific and management factors in Lyngbya bloom initiation and a GIS based model was produced to assess the relative hazard of the nutrients of concern in the catchment, believed to affect the growth, duration and severity of a Lyngbya bloom. We suggest a flexible process to integrate these models and demonstrate this process by an example scenario. The outcome of this scenario is expressed as a probability of Lyngbya bloom initiation. We then discuss ways of further automating this process and highlight where the points of integration and flexibility are.

4.2 Background Lyngbya majuscula is a cyanobacterium (blue-green algae) occurring naturally in tropical and subtropical coastal areas worldwide (Dennison et al., 1999; Arquitt and Johnstone, 2004). Deception Bay, in Northern Moreton Bay, Queensland, has a history of Lyngbya blooms (Watkinson et al., 2005; Ahern et al., 2007), and forms a case study for this investigation. The South East Queensland (SEQ) Healthy Waterways Partnership, a collaboration between government, industry, research and the community, was formed to address issues affecting the health of the river catchments and waterways of South East Queensland (Abal et al., 2005). The Partnership coordinated the Lyngbya Research and Management Program (2005-2007) which culminated in a Coastal Algal Blooms (CAB) Action Plan for harmful and nuisance algal blooms, such as Lyngbya majuscula. This first phase of the project was predominantly of a scientific nature and also facilitated the collection of additional data to better understand Lyngbya blooms. The second phase of this project, SEQ Healthy Waterways Strategy 2007-2012, is now underway to implement the CAB Action Plan (SEQ Healthy Waterways Partnership, 2007) and as such is more management focussed.

63

As part of the first phase of the project, Science and Management models for the initiation of a Lyngbya bloom was built using Bayesian Networks (BN). The structure of the Science BN was built by the Lyngbya Science Working Group (LSWG) which was drawn from diverse disciplines (Hamilton et al., 2007). The BN was then quantified with annual data and expert knowledge. The Management network was constructed during a series of meetings of the Lyngbya Management Working Group (LMWG) with members from local and state government and private organisations. It is a graphical representation of the catchment area showing the nutrients of concern being released into the Bay and identifying the point and diffuse sources discharging these nutrients. The second phase of the project identified the need to produce a Hazard map of the nutrients of concern Organic Carbon (OC), Phosphorous (P), Bioavailable Iron (Fe) and Nitrogen (N)) known to affect the growth, extent and duration of a Lyngbya bloom. The soil types and groundwater pH levels are also included, as they affect the solubility of the nutrients and the ability of nutrients to leach from the soil. The Hazard map is a GIS-based model and has a hazard rating for each unique land parcel, obtained by combining the hazard ratings on five GIS layers. Closer integration of these three models is required to consolidate the available knowledge and to improve management of the catchment area, thereby providing a better understanding of the impact of changes in management practices in the catchment area on Lyngbya bloom initiation, growth and duration. By integrating the Hazard map with the BNs previously developed, we can predict the probability of a Lyngbya bloom initiation in the Bay for a scenario of interest whilst taking on board the information captured in the Nutrient Hazard map.

4.3 Methods In the Nutrient Hazard map the hazard ratings for every uniquely identified land parcel in the catchment are calculated by assigning a hazard rating (1 – 4) to each of the nutrients of concern in each of the six GIS coverages. The coverages are acid sulfate soils, landuse, soil type, groundwater, vegetation pre-clearing and vegetation post clearing. The full methodology and rationale for these coverages are given in Pointon et al. (2008). These six new layers are then superimposed over each other and values added to produce a merged layer for each nutrient (e.g. merged Fe layer). The new merged layer is then multiplied by a proximity to streams factor for that nutrient to produce a final hazard layer for that nutrient (note: the proximity factor may vary for each nutrient). The final layer for each nutrient is then overlain and added up to produce a final model or hazard model/map (see Fig.1 and 2; Pointon et al., 2008). In addition, there are two Bayesian networks (BN) for Lyngbya bloom initiation prediction. One is the Science BN which focuses on nutrient and physical factors that were agreed by the LSWG to be the most influential contributors to the initiation of Lyngbya. The other is the Management BN which focuses on management inputs that potentially influence the delivery of nutrients to the Bay. The 14 land uses of interest identified by the LMWG are six point 64

sources (aquaculture, composting, on-site sewage, poultry, waste disposal and waste water treatment) and eight diffuse sources (agriculture, artificial development, developing and clearing, extractive industries, forestry, grazing, natural vegetation and stormwater). The challenges to integrate these models are 1. Mapping the land uses in the Hazard maps (74) to the point and diffuse sources in the Lyngbya management model (14) 2. Assigning a hazard number to each of the nutrients for each of the point and diffuse sources of the management model, which will be referred to as the ‘management hazard factor’ 3. Translating the resulting management hazard values into probabilities in the conditional probability tables (CPT) of the Science BN for each of the land uses, to observe the effect on the probability of a Lyngbya bloom initiation 4. Translating and quantifying scenarios, e.g. changing from one land use to another, or a percentage change in a particular nutrient of concern 1.

Mapping Land Uses to Point and Diffuse sources

On the Hazard Map there are 74 different land uses (LU). In contrast, there are only 14 land uses of interest identified by the LMWG and included in the Management network. The Hazard LUs need to be mapped to the Management LUs to provide communication between the models. 2.

Assigning a Management hazard number to each source

There are inherent hazards associated with the various activities in the catchment area and changes to those practices have implications on the possibility of Lyngbya blooms initiating in the Bay, as well as the severity, extent and duration of the bloom. The integration of the models has information flowing from Hazard Map  Management BN  Science BN via the LU mapping. The direction of information flow can also be reversed by applying the LU mapping in reverse. However, since the Hazard to Management LU mapping is ‘many to one’ it then becomes a ‘one to many’ mapping and the information has to be split according to some criteria. The most appropriate way would be by apportioning the values using the area of the land parcel for that hazard land use divided by the overall area for the management land use. However the option exists to use other criteria deemed to be suitable. A Java program has been written to calculate a hazard value for each of the nutrients identified in the Hazard map via the ‘Hazard LU to Management LU’ map from the previous step. The hazard for each nutrient for a particular land use in the Hazard map is multiplied with the ‘coast and stream buffer distance’ hazard and the area of each unique land parcel. This value is then divided by the total area. In other words the total hazard is weighted by the relative size of the land parcel with that land use and its coastal proximity. The flexibility 65

exists to change the way in which the combined nutrient hazard value is calculated for each land parcel. The program then translates the calculated hazards for that hazard land use to a management land use as dictated by the ‘Hazard LU to Management LU’ map. Since the mapping is many to one, there may be several rows for one management land use. All the rows for each management land use are added together, weighted by area to arrive at the final ‘management hazard’ for that management land use. These calculated values constitute the set of ‘nutrient hazard factors’ for that management land use and the resulting table is shown in Table 4.1 below. Once again we have an opportunity to change the way in which the program calculates the overall hazard. For example, instead of weighting by area, weighting by the underlying hazard land use may be preferred. In other words even though several hazard land uses map to one management land use, it may be felt that certain hazard land uses contribute more to the overall hazard of that management land use. Table 4.1: Nutrient hazard factors for management land uses Land Use aquaculture composting on-site sewage poultry waste disposal waste water treatment agriculture artificial development developing and clearing extractive industries forestry grazing natural vegetation stormwater

Area (km²)

Fe 6.32 2.11 2.22 8.89 6.47 0.59 3.54 2.91 3.25 6.93 3.98 1.11 2.42 2.46

4.97 4.17 927.27 7.43 7.76 156.32 1,050.31 513.68 1,201.21 89.94 1,641.19 12,571.69 4,096.52 180.67

P 5.28 1.62 3.10 5.20 5.46 0.82 4.39 2.40 3.25 1.63 2.21 1.28 1.50 1.43

OC 10.04 2.61 2.60 9.54 10.27 1.14 4.63 3.11 3.34 2.77 5.34 1.81 4.79 2.49

N pH 5.28 1.76 1.62 1.62 3.10 1.55 5.20 1.73 5.46 1.82 0.82 0.78 4.39 2.78 2.40 1.71 4.73 1.78 1.63 4.76 1.11 3.32 1.28 1.22 1.41 2.77 1.43 1.60

3. Translating Management Hazard values into probability changes The Lyngbya Science BN reflects the current situation for the nutrients of concern and their contributions to the pool of dissolved nutrients, which is represented by the summary node Available Nutrient Pool (Dissolved). We therefore need to equate the current probabilities of the states of this node, ‘Enough’ (32.05%) and ‘Not enough’ (67.95%) to the current overall nutrient hazard as determined by the Hazard Nutrient Map. The overall hazard for a particular management land use should combine all the nutrients of concern, including pH and soil type. The hazard factors are assumed to be of equal importance and hence a simple average is used to calculate the overall hazard of each management land use. If certain nutrients, pH and/or soil type are shown to have a greater or lesser impact, 66

we can adjust the calculation in the program to be a more complex average which takes into account the relative impact of each hazard factor (nutrients of concern, pH and soil type). However, we first need to derive a single hazard value for the Hazard Map, representing the current situation for nutrients of concern. This value will form the base line for integration with the Lyngbya Science BN and is calculated by weighting the hazards according to the area (or another criteria of choice) of each management LU relative to the overall size of the catchment area. The resulting overall hazard value is 2.02. This absolute value does not have any direct meaning as such, but instead should be interpreted in relation to other such hazard factors. A nutrient hazard rating in the Hazard map may be construed to mean the perceived hazard that there will be ‘enough’ of that nutrient to cause an increase in growth, extent and duration of a Lyngbya bloom. The maximum hazard for any management land use is 16, which results from a nutrient hazard rating of 4 for the land use and 4 for the proximity rating. This represents the absolute certainty of a severe bloom (100% probability). Also the overall management hazard, when viewed relative to the maximum value, needs to be equated to the 32.05% probability of the Available Nutrient Pool (Dissolved) node in the Lyngbya Science BN. Then any nutrient hazards and changes in the current situation can be assessed relative to the current situation. Applying the same logic for each of the management land uses, results in Table 4.2 below, showing the average hazard rating and the corresponding probabilities for having enough of the dissolved nutrients available to cause a bloom. The latter value can then be used in the Science BN for scenario testing. Table 4.2: Management land use showing the average hazard from the hazard map and the corresponding probability of having enough dissolved nutrients in the available pool Land Use aquaculture composting on-site sewage poultry waste disposal waste water treatment plant agriculture artificial development developing and clearing extractive industries forestry grazing natural vegetation stormwater

Average management risk 5.74 1.92 2.51 6.11 5.90 0.83 3.95 2.51 3.27 3.54 3.19 1.34 2.58 1.88 67

Probability of 'enough' 67.92% 35.94% 43.87% 69.77% 68.73% 11.62% 57.02% 43.77% 51.53% 53.88% 50.85% 25.54% 44.59% 35.45%

4. Translating and quantifying scenarios In order to investigate the effect different scenarios have on Lyngbya bloom initiation, growth and duration, we need to establish a base line from which these scenarios can be evaluated. As mentioned previously the current probability of having enough dissolved available nutrients in the Science BN is 32.05% and the probability of Lyngbya bloom initiation is 25.11%. We will use this bloom initiation probability as the reference point for any scenario testing. It is possible to adjust the likelihood of a node of interest (such as the bioavailable iron nutrient node or the available nutrient pool (dissolved) node) in the Science BN to be the same as the value calculated from the Hazard Map. All the interactions and directions of relationships in the BN are retained and the probabilities are adjusted based on the ‘findings’ entered for that node. This process of updating the likelihoods of the other nodes is referred to as belief propagation. So as we find out more about a nutrient of concern, we are able to update the BN with this new information and examine how it affects the other key factors in the network and ultimately the effect on the probability of Lyngbya bloom initiation. From Table 4.2 we can see that if we are interested in a certain nutrient of concern, we can use the corresponding probability in the Science BN to examine how it affects the probability of a Lyngbya bloom initiation.

4.4 Results To illustrate how scenario testing may be performed, we use the hypothetical situation of changing the management land use from natural vegetation to agriculture throughout the catchment area. We are interested to see how this change may affect the probability of Lyngbya bloom initiation. To simulate this hypothetical scenario, we need to apply the hazard values for ‘nutrients of concern’ for agriculture to the areas assigned as natural vegetation. We therefore need to change the hazard values for a total area of 4,097 km². Once this has been done, we recalculate the overall hazard value. The new overall hazard value is 2.27 (previously 2.02) and consequently the probability of ‘enough’ nutrients of concern for this scenario is 40.83%, a fairly substantial increase compared to the previous value of 32.05%. When we update the BN with this new value and propagate the belief though the network, the probability of a Lyngbya bloom initiation increases from 25.11% to 31.34%. The effect of this land use change is diluted by the fact that the proportion of the catchment designated as natural vegetation is only 18.24%. To better appreciate the potential impact of that type of land use, it is interesting to disregard area and use instead the probabilities from Table 4.2 for there being ‘enough’ of the Available Nutrient Pool in the BN for a particular land use. We can then observe the change in probability of Lyngbya bloom initiation between agriculture and natural vegetation when this information is propagated through the network. Following this approach causes a change in the probability of a Lyngbya bloom initiation from 34.01% for natural vegetation to 42.84% for agriculture.

68

It is also possible to make some scenario changes in the Hazard map, map the land uses to management land uses and repeat the procedure above. This will then finally give the change in the likelihood of a Lyngbya bloom initiation.

4.5 Discussion The integration process is currently partially automated, but it may be desirable to further automate the process if it is being used as a tool to assess impact of changes in the catchment or for business as usual. A graphical user interface (GUI) which enables the dynamic mapping of land uses and then provides output in report or spreadsheet format for the different management land uses would be beneficial. This requires the subsequent automated invocation of the BN java application program interface (API) so that the overall hazard, which have been translated into likelihoods, may be propagated through the network to yield the change in the probability of a bloom initiation. Furthermore, a GUI which allows scenario entry will provide the end-user with a powerful tool in discussions and negotiations about management actions in the catchment.

4.6 Conclusions This paper describes and demonstrates the integration of two Bayesian Networks and a GIS-based model for Lyngbya majuscula to overcome a common problem faced by environmental researchers in combining outcomes from disparate expert modelling systems. By using a program interface, there are several opportunities to customise the way in which this integration occurs. It also provides the option of automating the interface so that the different ways of integration and scenario testing may be performed by an environmental manager or other person interested in assessing the effects of different management actions on an environmental issue of concern. TAKE HOME MESSAGE

Bayesian networks with interfacing bespoke Java programs are able to provide a flexible approach to achieve the integration of multiple ecological models constructed for an issue of concern, such as Lyngbya bloom initiation. This culminates in a powerful tool to assess the impact of changes to the discharge of nutrients of concern as a result of a change in land use, climate or any other scenario of interest.

4.7 Acknowledgements This research was funded by the SEQ Healthy Waterways Partnership and the EPA’s Coastal Algal Bloom (CAB) Action Plan. The contributions of the Lyngbya Science and Management teams are fully acknowledged.

69

4.8 References Abal E G, Greenfield P F, Bunn S E and Tarte D M 2005 Healthy Waterways: Healthy Catchments – An Integrated Research/Management Program to Understand and Reduce Impacts of Sediments and Nutrients on Waterways in Queensland, Australia, Frontiers of WWW Research and Development - APWeb 2006, (Harbin, China: Springer Berlin / Heidelberg) pp 1126-35 Ahern K S, Ahern C R, Savige G M and Udy J W 2007 Mapping the distribution, biomass and tissue nutrient levels of a marine benthic cyanobacteria bloom (Lyngbya majuscula) Marine and Freshwater Research 58 883-904 Arquitt S and Johnstone R 2004 A scoping and consensus building model of a toxic blue-green algae bloom System Dynamics Review 20 179-98 Dennison W C, O’Neil J M, Duffy E J, Oliver P E and Shaw G R 1999 Blooms of the cyanobacterium Lyngbya majuscula in coastal waters of Queensland, Australia Bulletin de l’Institut Oceanographique, Monaco 19 501-6 Hamilton G S, Fielding F, Chiffings A W, Hart B T, Johnstone R W and Mengersen K 2007 Investigating the Use of a Bayesian Network to Model the Risk of Lyngbya majuscula Bloom Initiation in Deception Bay, Queensland Human and Ecological Risk Assessment 13 1271-9 Pointon S M, Ahern K S, Ahern C R, Vowles C M, Eldershaw V J and Preda M 2008 Modelling land based nutrients relating to Lyngbya majuscula (Cyanobacteria) growth in Moreton Bay, southeast Queensland, Australia Nature - Memoirs of the Queensland Museum 54 377 - 90 SEQ Healthy Waterways Partnership 2007 South East Queensland Healthy Waterways Strategy 2007-2012 : Coastal Algal Blooms Action Plan. Watkinson A J, O'Neil J M and Dennison W C 2005 Ecophysiology of the marine cyanobacterium, Lyngbya majuscula (Oscillatoriaceae) in Moreton Bay, Australia Harmful Algae 4 697-715

70

4.9 Appendix to Conference Paper These figures will form part of the journal paper currently in preparation for submission to the journal Human and Ecological Risk Assessment, and which is based on the conference paper included here.

Figure A.1: Extract of the Management Network for Mellum Creek Subcatchment, a visual representation of the sub-catchment, showing point and diffuse sources of nutrients. The inset shows the complete Management Network

71

Figure A.2: Meso-scale nutrient hazard map of the Deception Bay and southern Pumicestone Passage area including the addition of pine plantations, Melaleuca, and ASS (Pointon et al 2008)

72

Figure A.4: Lyngbya Science BN reviewed on 1st December 2008

Figure A.3: Conceptual diagram of scenario testing (Use Case Diagram)

73

Figure A.5: Activity Diagram for Example Scenario: Change natural vegetation to agriculture

74

Chapter 5 Modelling cheetah relocation success in southern Africa using an Iterative Bayesian Network Development Cycle This chapter has been written as a journal paper3 and introduces a new heuristic, the Iterative Bayesian Network Development Cycle (IBNDC) which details the processes involved in BN modelling. It encompasses the creation, validation and maintenance of a BN model. We then apply the IBNDC heuristic to model the success of cheetah relocations in South Africa and Botswana. A workshop was held in South Africa and attended by cheetah experts from South Africa and Botswana, where several networks were formulated to distinguish between the unique relocation experiences and conditions in the two countries. The IBNDC approach is conducive to Object Oriented BNs facilitating the creation of dynamic networks and the refinement, reuse and redesign of existing BNs. These characteristics are illustrated in chapters 6 and 7.

3

The revised journal paper has been resubmitted to Ecological Modelling and is currently under review.

Modelling cheetah relocation success in southern Africa using an Iterative Bayesian Network Development Cycle a*

a

b

c,e

f

d

Sandra Johnson , Kerrie Mengersen , Alta de Waal , Kelly Marnewick , Deon Cilliers , Ann Marie Houser , Lorraine d Boast a School of Mathematical Sciences, Queensland University of Technology, GPO Box 2434, Brisbane, QLD 4001, Australia b Meraka Institute, Council for Scientific and Industrial Research (CSIR), Meiring Naudè Road, Pretoria, 0001, South Africa c Carnivore Conservation Group, Endangered Wildlife Trust, P. Bag X11, Parkview, 2122, Johannesburg, South Africa. d Cheetah Conservation Botswana, Private Bag 0457, Gaborone, Botswana e Centre for Wildlife Management, University of Pretoria, Pretoria, 0002, South Africa f Wildlife Conflict Prevention Group, Endangered Wildlife Trust, P. Bag X11, Parkview, 2122, Johannesburg, South Africa.

75

Modelling cheetah relocation success in southern Africa using an Iterative Bayesian Network Development Cycle Abstract Relocation is one of the strategies used by conservationists to deal with problem cheetahs in southern Africa. The success of a relocation event and the factors that influence it within the broader context of long-term viability of wild cheetah metapopulations was the focus of a Bayesian Network (BN) modelling workshop in South Africa. Using a new heuristics, Iterative Bayesian Network Development Cycle (IBNDC), described in this paper, several networks were formulated to distinguish between the unique relocation experiences and conditions in Botswana and South Africa. There were many common underlying factors, despite the disparate relocation strategies and sites in the two countries. The benefit of relocation BNs goes beyond the identification and quantification of the factors influencing the success of relocations and population viability. They equip conservationists with a powerful communication tool in their negotiations with land and livestock owners, which is key to the long term survival of cheetahs in southern Africa. Importantly, the IBNDC provides the ecological modeller with a methodological process that combines several BN design frameworks to facilitate the development of a BN in a multi-expert and multi-field domain. Keywords: Acinonyx jubatus; Bayesian network; cheetah metapopulation; predator human conflict; relocation; iterative approach; IBNDC

76

5.1 Introduction The current status of the cheetah (Acinonyx jubatus) is Vulnerable, VU C2a(i). This status means that it is considered to have a high risk of extinction in the wild, that the estimated population size is fewer than 10,000 mature cheetahs and that there is a continuing decline in the numbers of mature cheetahs. There is also no subpopulation with an estimated size of more than 1000 mature cheetahs (IUCN, 2007). At the turn of the 20th century cheetahs still inhabited vast areas of Africa and Asia and were found in at least 44 countries, stretching from the Cape of Good Hope to the Mediterranean, the Arabian Peninsula and the Middle East, India and Pakistan in southern Asia and the southern provinces of Russia (GCCAP, 2002; Marker, 2002). Since then the cheetah population worldwide has declined from approximately 100 000 cheetahs in Africa and Asia to around 250 cheetahs (Acinonyx jubatus venaticus) in Iran (IUCN, 2007) and in sub-Saharan Africa the number of cheetahs (Acinonyx jubatus jubatus) is estimated at between 12 000 and 15 000 (Marker et al., 2003). Sub-Saharan Africa contains the only remaining viable wild populations (Marker et al., 2003), with Kenya and Tanzania in East Africa and Namibia and Botswana in southern Africa, being the two main cheetah strongholds in Africa (GCCAP, 2002). The main reasons for this negative trend in cheetah numbers are interspecific competition, increased contact and conflict with humans and fragmented habitat (GCCAP, 2002). Cheetah conservation organisations in southern Africa face the daunting task of addressing these issues to ensure the long-term viability of wild cheetah populations. In South Africa and Botswana relocation is one of several methods used to conserve cheetahs (Purchase et al., 2007). This involves trapping cheetahs that are perceived by landowners to be problem animals and releasing them into other areas. The successful relocation of a cheetah, or group of cheetahs, was the focus of a Bayesian Network (BN) modelling workshop held in South Africa which was attended by cheetah experts from South Africa and Botswana. The objective of the BN model was to increase the survival of the greater cheetah metapopulation, informed by the success of a relocation event. Bayesian networks are popular for modelling complex and multi-faceted environmental issues (Castelletti and Soncini-Sessa, 2007; Marcot et al., 2006; Smith et al., 2007) such as predator relocations. A BN is a mathematical model (Pearl, 1988; Neapolitan, 1990; Jensen and Nielsen 2007) that consists of a graphical depiction of random variables and a probabilistic framework that describes the strength of the relationships between the variables (Jensen and Nielsen, 2007). BNs are a useful statistical tool for collating, organising and formalising information such as empirical data, model outputs, secondary sources and expert knowledge about the issue of concern (Uusitalo, 2007). They have been used in very diverse applications, such as forensic science (Taroni et al., 2006), toxic algal bloom initiation (Hamilton et al., 2007), environmental impact of fire-fighting methods (De Waal and Ritchey, 2007) and urban land use classification from satellite imagery (Park and Stenstrom, 2008). The graphical form that BNs take are that of a directed acyclic graph, comprising a set of random variables (factors) 77

represented as nodes and linked through directed arrows to one or more variables depicting the outcome(s) of interest (target node(s)) (Jensen and Nielsen, 2007). The network is quantified through a series of conditional probabilities based on the available information (Borsuk et al., 2006; Taroni et al., 2006; Jensen and Nielsen, 2007). Various types of BNs (traditional, object oriented, dynamic) may be used to model a variety of ecological problems. Traditional BNs are suited to many situations, but inadequate when modelling large complex domains (Uusitalo, 2007). These are better served by Object Oriented Bayesian networks (OOBN). OOBNs provide a framework for modelling large complex data structures by simplifying the knowledge representation and facilitating reuse of nodes and network fragments (Koller and Pfeffer, 1997). Another specialisation of the BN is the Dynamic Bayesian Network which is essentially a traditional BN with a temporal dimension (Weber and Jouffe, 2006) where interdependent entities change over time (Ross and Zuviria, 2007). Dynamic BNs are used to model time series data (Ghahramani, 2001) and are ideally suited to object oriented modelling techniques (Jensen and Nielsen, 2007). This paper describes the development of a BN for the evaluation of a successful relocation of wild cheetahs using the new IBNDC heuristic. Success is here defined in terms of short-term survival of the relocated cheetah(s) and long-term population viability of the cheetah population in light of the relocation. The paper synthesises the experience of South Africa and Botswana as representative countries for cheetah relocation. These two countries were selected for three main reasons. First is the range of relocation experiences: although both have experience in relocations, South Africa has had substantial experience whereas relocation is still relatively new in Botswana. Second are their geographical locations: they are neighbouring countries with predators known to move freely between the countries (Marnewick et al., 2007), therefore conservation management practices in one country are bound to impact on the other. In particular cheetah home ranges are known to span both countries. Third is the variety in relocation sites: the two countries have quite different types of areas available for relocation and their practices in relocating cheetahs also differ. The paper proposes an original heuristic method, an iterative BN development cycle (IBNDC) to create a decision-support system that consolidates available information and experience of experts from different countries, organises opinion to better understand the inter-relation of factors that affect relocation of cheetahs and metapopulation viability, and helps to guide the choice of sites for a successful relocation.

78

5.2 Methods 5.2.1 Study Area South Africa and Botswana had different options with relocation sites, but the

Figure 5.1: Current cheetah distribution and relocation sites in South Africa (Marnewick et al., 2007)

key factors identified as critical to ensure successful relocation were endorsed by all. Relocation sites in South Africa (Fig. 5.1) are scattered widely throughout the country, with several located at the northern border with Botswana where the South African free roaming cheetah population occurs (Marnewick et al., 2007). The majority of South Africa (approx 70%) is privately owned with state-owned, protected areas totalling less than 5% (Cummings, 1991). This has necessitated the creation of a metapopulation management plan to manage geographically separated populations of endangered predators such as cheetahs and wild dogs Lycaon pictus (Lindsey et al., 2005) as a whole. Cheetahs are distributed sparsely throughout Botswana (Fig. 5.2) with roughly two thirds of the land area providing suitable habitat. This includes areas in the arid zone and the lush Okavango Delta in the north-west of the country. Although Botswana has large areas that could support cheetah populations, there are concerns about the degradation of this habitat, because desertification, overgrazing and lack of fresh water resources are serious environmental issues that face Botswana (BCP, 2002).

79

Figure 5.2: Cheetah estimates in Botswana by predator management zones (Klein, 2007)

Four different types of relocation sites were identified. These comprised Protected and Fenced, Protected and Unfenced, Unprotected and Unfenced, and In Situ. In situ relocation occurs when the land owners where the cheetahs were trapped agree to have the animals released back into their home range. This may happen after consultation between land owners and conservationists. In South Africa relocations are mainly done into fenced protected areas and occasionally in situ relocations are done. The options in Botswana are relocations into unfenced protected areas or unprotected areas and in situ relocation.

5.2.2 Iterative Bayesian Network Development Cycle (IBNDC) The Bayesian networks were conceptualised and quantified during a four-day workshop, bringing together cheetah experts from South Africa and Botswana and statisticians from South Africa and Australia. As described in Section 5.1, a Bayesian network typically comprises one or more target nodes and a set of factors linked directly or indirectly with these node(s). For cheetah relocation, the design requirements were: multiple linked target nodes (success of a particular relocation event and a viable free-ranging cheetah population), multiple networks for a given target node (success of a relocation event to relocation sites with different characteristics) and growing expert knowledge and experience. In addition, it was agreed to be prudent to cater for possible expansion or transformation of these networks into dynamic or object oriented BNs and to satisfy adaptive management requirements so that the cheetah relocation BNs learn from subsequent relocation events. Consequently the uncertainty present at the time of modelling diminishes in light of new evidence and experience (Bosch et al., 2003; Smith et al., 2007). A modelling approach, the Iterative BN Development Cycle (IBNDC), was developed to 80

suit these varied objectives and was used at the workshop. It consists of two primary processes: a Core Process and an Iterative Process. The Core Process is performed once when modelling commences, typically in a workshop setup, and is vital to the subsequent iterative phases embodied in the Iterative Process. The Core Process is a largely manual process, demanding interaction between experts and comprising the definition of target nodes, identification of key factors and grouping of subnetworks. In contrast, the Iterative Process consisting of four iterative phases, can exploit many automated features of the BN modelling software application, in addition to the input from experts. The iterations explicitly focus on the definition, quantification, validation and evaluation of subnetworks prior to consideration of the whole model. For the purpose of modelling cheetah relocations, this was very useful for three reasons: a seemingly large task was broken down into more manageable components; there was early and continuous feedback to the participants; and the subnetwork summary (or target) nodes were of interest in themselves. The four iterative phases were continually revisited as the network structure and its quantification were crystallised. Figures 5.3, 5.4 and 5.5 describe the IBNDC heuristics. Figure 5.3 is a visual representation of the key IBNDC concepts and illustrates the identification and definition of the outcome(s) of interest (target node(s)) as pivotal to all the subsequent steps (Varis and Kuikka, 1999). For example, key factors (nodes) are identified and described in relation to the outcome(s) of interest and consequently belated changes to the target node(s) may negate key factors described prior to the change. A unified modelling language (UML) Use Case diagram (Fig. 5.4) depicts the interactions between the expert teams and the IBNDC processes. Figure 5.5 shows a UML Activity diagram with a detailed account of the steps involved in following the proposed IBNDC methodology.

Figure 5.3: Conceptual representation of the Iterative BN Development Cycle (IBNDC)

81

Figure 5.4: UML Use Case diagram showing the interactions between the expert teams (modelling and validation) and the IBNDC processes

Core Process The three steps of the Core Process are shown in the IBNDC Conceptual diagram in Fig. 5.3. Step 1 occurs at the centre of the IBNDC, and is arguably the most important step of the process as this encapsulates the objective of the model: What issue do we want the model to address? (Varis and Kuikka, 1999). This is the end-point or final aim of the network and is represented in the BN as the target node. Careful definition of the target node is crucial to the structure, assumptions and identification of the key factors of the ensuing network. During the workshop much discussion focused on the definition of two target nodes – a successful relocation event (Success - site), and a viable wild cheetah population – (Success – long term) expressed in such a way that they could be represented probabilistically. As represented by its insularity from the rest of the development cycle in Fig. 5.3, the target nodes were not changed once agreement had been reached by all stakeholders, as such changes would have negated the rest of the network. Step 2 required not only the listing of relevant factors, but also their definition. Sticky notes were used to brainstorm these factors for the two target nodes. For Step 3, the sticky notes which logically belonged together were arranged into several smaller coherent groups, which would form the basis for subnetworks in the overall network. At this point the information was transferred into the Hugin® BN 82

modelling software and the different groups of nodes were colour-coded for clarity. This marked the start of the Iterative Process.

Iterative Process The Iterative Process was applied to each of the groups defined in the Core

Figure 5.5: UML Activity diagram demonstrating the IBNDC processes Process and then to the overall network. Iterations continued for a subnetwork (or overall network) until there were no more changes received from Phases 3R and 4R (Fig. 5.5). The first iteration of Phase 1R (Define/Modify) for the overall BN entailed reviewing the nodes that were defined in the Core Process and then creating a conceptual model of the network, including the placement of nodes and the connection of nodes through directed links. Ensuring accurate documentation of the node definitions is important in the interpretation of the interactions with other nodes. The node definitions were documented in Hugin® and then used to generate network documentation of the cheetah relocation BNs. Node definitions were frequently referred to during the development of the BN to ensure consistent interpretation of the factors by all experts. The second and subsequent iterations added, deleted or modified nodes and directed links, as dictated by the results from phases

83

3R and 4R (Fig. 5.5). These iterations were performed for each of the subnetworks before considering the overall network. In Phase 2R (Quantify) the states of the nodes were defined and the underlying conditional probability tables (CPT) populated. It is advisable to limit the number of states of a node and parent nodes to prevent unwieldy probability tables (Marcot et al., 2006). Nevertheless it can be quite a daunting task for experts to complete the CPTs (Pollino et al., 2007b), especially when there are subtle variations in the combination of states of the parent factors or when the combination of factors presents a theoretical than a realistic scenario. In these situations it was constructive to encourage the experts to prioritise the relative importance of the parent nodes. This enabled the remaining CPT values to be populated based on the more plausible combinations which the experts felt comfortable and confident about specifying. The remaining probabilities were calculated by using the relative weights (importance) of the parent nodes and states. Afterwards these calculated probabilities were reviewed and adjusted as directed by the experts and also as a result of the successive iterative phases in the IBNDC. After each network had been quantified, it was tested in Phase 3R (Validate) to examine whether the predictions were consistent with known behaviour and whether the BN respected known causal relationships. This included reflection on the accuracy of predicted probabilities and whether the predictions respected expected patterns of change incurred as a result of changes in factor probabilities. The testing was primarily done using expert knowledge to interpret the observed behaviour of the network (or subnetwork) (Pollino et al., 2007b). If this was satisfactory, data conflict analysis was performed to ensure that the evidence entered was in line with the modelled structure. If there were inconsistencies, this could be due to either an error in the entered data (evidence), an error in one of the CPTs or in the directed links between the nodes. Inconsistent behaviour necessitated the reassessment of nodes, states and probabilities which were addressed in the next iteration of Phases 1R and 2R. Further information on the data conflict analysis used in this study can be found on the Hugin® website (Hugin, 2007). In addition to the validation performed at the subnetwork level, the entire network was also tested by assessing the target node behaviour in two ways: (i) using only the subnetwork end-point nodes, that is, treating each subnetwork as a single node and (ii) using the input/observation nodes, that is, the leaf nodes of the network. In Phase 4R (Evaluate), the subnetworks were evaluated through inference (de Waal and Ritchey, 2007), scenario testing (case studies from experts) and sensitivity analysis (Pollino et al., 2007a). Evaluation through inference was done by using the BN in a predictive mode (effect on survival if the states of particular factors are specified), prescriptive mode (best level of a factor if the states of other factors are specified) and diagnostic mode (circumstances corresponding to best or worst survival) (de Waal and Ritchey, 2007). Once the final evaluation (last iteration of Phase 4R) of the subnetworks and the entire network had taken place, an external evaluation of the network by 84

another expert panel was conducted. Any suggested structural and probabilistic changes by this panel were submitted for confirmation by the original expert panel responsible for creating the BN. In the project described in this paper, discussion about the composition of the external expert panel and its role in determining the final network was deferred until the last iteration of Phase 4R. However choosing an expert review team is not an iterative process and we therefore recommend that this activity is instead undertaken as part of the Core Process once the outcome of interest has been defined (step1). The expert modelling team will then be able to decide on a review panel who they feel is suitably qualified to review the BN being modelled.

5.3 Results 5.3.1 Core Process The target nodes identified by the panel were short-term survival of the relocated cheetah (Success - site) and long-term cheetah population viability (Success – long term). Successful short-term survival is when recruitment exceeds adult death rate in a breeding population of cheetah during the three years post release (Hayward et al., 2007). Besides this definition for shortterm survival, the expert panel identified additional indicators of short-term success as the ability of the cheetah to successfully hunt prey, successful socialisation with other cheetahs and capacity to breed. For females, capacity to breed was defined as successful reproduction of a first generation; for males, it was defined as successful reproduction of the male or his coalition or his ability to establish and hold a territory for 1.5 years. Long-term population viability was defined in accordance with the IUCN definition, including successful first generation reproduction and natural recruitment exceeding deaths (IUCN, 2007). Although many of the key factors were endorsed by both countries, the relocation events were sufficiently different to be considered independently. The consequences of relocating to an unfenced versus a fenced site not only introduced additional factors, but also negated other factors and changed interactions between some factors. Similar differences were found in considering relocation into protected versus unprotected areas. For these reasons multiple networks commensurate with four types of relocation events were required for short-term survival of the relocated cheetah, with corresponding slight changes to the network for population viability. The four relocation events were (1) relocation into fenced protected areas; (2) relocation into unfenced protected areas; (3) relocation into unfenced unprotected areas; (4) in situ relocation. We describe below the OOBN for the first relocation event into fenced protected areas. The OOBNs for the other three relocation events are shown in Appendix (A3 – A5).

85

Relocation into fenced protected areas Figure 5.6 depicts the conceptual network developed for relocation into protected fenced areas and Table 5.3 in Appendix A1 contains descriptions of the nodes for this network.

Figure 5.6: Conceptual network for relocation into protected fenced areas showing the node groupings at the end of the Core Process. The nodes were assigned to six groups; Area Characteristics (green), Existing population (light blue), Management Issues (blue), External Support (yellow), Direct Factors (light green), Survival (orange). The node descriptions are in Table 5.4 and several CPTs are in the Appendix.

The probability of survival of a relocated cheetah in a fenced protected site was directly dependent on four factors: the Release type, Site factors, the existing cheetah population (Existing cheetahs) and External Support (Fig. 5.6). Whereas a hard release (Release Type) sees the animal released as soon as possible after capture, a soft release involves habituating the cheetah in a boma (a temporary holding facility suitable to keep the specific predator for a period of time, prior to release or for veterinary reasons) or similar enclosure for up to three months and is generally accepted to be the preferred method of relocating predators (Gusset et al., 2006). Although the Release Type can depend on the Reserve Objectives (hunting, conservation or ecotourism) current releases are almost exclusively soft in South Africa and hard in Botswana. Site factors refer to the suitability of the site for relocation and include the existence of predator-proof fencing (Predator proof fence), Site-specific risks (Human, Disease, Neighbour Support and Environmental), Ecological suitability (Habitat type, Relocation site size, Prey availability and Predator threat) and the type of monitoring of cheetahs (Monitoring) by the owner of the site. The expert team determined the frequency of monitoring to be influenced by the Reserve Objectives with ecotourism reserves almost always having monitoring in place and most likely monitor the relocated cheetahs on a daily basis. Whereas reserves with a conservation or hunting focus, although also likely to monitor the animals, are usually not monitoring 86

the cheetahs as frequently as ecotourism reserves. To determine the Predator Threat, only lions and spotted hyena were considered since leopards are assumed to be always potentially present. The expert team deemed the need to consider the suitability of the existing cheetah population (Existing cheetahs) with respect to the relocated animal to be a consequence of the increased management required in fenced areas (Site Metapopulation). The Population size after the proposed relocation event, its Population structure (gender, coalitions, age) and the Genetic relatedness dictate its suitability. The genetic relatedness of the resident cheetah population was strongly influenced by whether the site participates or showed a willingness to participate (Site Metapopulation) in a metapopulation plan (Metapopulation). The expert team argued that such a plan was an integral part of South African cheetah conservation and was particularly important for confined animals in fenced areas, but that no such plan existed for, or was relevant to, relocated cheetah populations in Botswana. The second target node, long-term population viability (Success – long term), was influenced by the survival of the cheetah at the site (Success – site), the metapopulation plan (Metapopulation) and External support comprising both Community Support (Non Governmental Organisations and peer support) and Government Support. Support from farmer communities, conservation bodies are believed to carry a lot of weight with respect to the successful outcome of a relocation event and to the viability of the wild cheetah population. Support from the government includes the existence of positive legislation for relocation and the commitment to its implementation.

5.3.2 Iterative Process Iterations of the four phases in this process were performed for each of the subnetworks after the outcomes of interest (target nodes) were clearly defined and the subnetworks conceptualised. The subnetworks in the protected fenced relocation BN have summary nodes and are colour coded, for example (Fig. 5.6) the Ecological suitability subnetwork has nodes Habitat Type, Site Size, Relocation Site Size, Possible Expansion, Prey Availability, Predator Presence and Predator Threat, and summary (or child) node Ecological Suitability.

Phase 1R (Define/Modify) and Phase 2R (Quantify) A BN is quantified by means of probability tables (CPTs). Each node in the network has a probability table associated with it and the table is defined by the parent nodes feeding into the particular node. For the protected fenced network, a total of 520 probabilities were elicited by the expert panel and the largest probability table was Site specific risks with 96 probabilities. Several CPTs for this network are included as an appendix to this paper, including two of the subnetworks (Ecological suitability and Site factors) for the OOBNs created for the different types of relocation events.

87

Phase 3R (Validation) and Phase 4R (Evaluation) Validation and evaluation of the networks were done by the workshop expert panel using case studies of known relocation sites, history of relocation events at those sites and running ‘what if’ scenarios to verify that the model is behaving in accordance with known situations. The networks were also reviewed by two cheetah experts in Botswana who were not part of the workshop panel developing these relocation networks. Moreover it is important to identify those model parameters for which variations in CPT values produce the greatest changes in the network end points (parameter sensitivity). Further attention must be paid to these nodes to ensure that their CPTs are precise (Laskey, 1995; Pollino et al., 2007b). The sensitivity of the target nodes to variations in the evidence entered into the BN also needs to be assessed (evidence sensitivity) (Varis and Kuikka, 1999; Bednarski et al., 2004; Pollino et al., 2007b). Sensitivity analysis was therefore performed on the two end points, success of a relocation event (Success - site) and long-term population viability (Success - long term). We discuss here sensitivity analysis for the protected fenced network. Evidence sensitivity measures the degree of variation in the BN’s posterior distribution resulting from changes in the evidence being entered in the network. Ranking the evidence nodes accordingly assists the expert in targeting future data collection and in identifying any errors in the BN structure or CPTs (Pollino et al., 2007b). Two popular ways in which to measure evidence sensitivity are entropy and mutual information (Pollino et al., 2007b). Entropy, H(x), measures the randomness of a variable and is calculated as follows (Pearl, 1988; Korb and Nicholson, 2004; Pollino et al., 2007b):

H ( X )   P( x) log P ( x)

(1)

where P(x) is the probability distribution of X The entropy values for the protected fenced BN are shown in Table 5.1. These results show that the type of neighbouring property (Human) and the composition of the existing cheetah population at the site (Existing cheetah population, Population size, Population structure) as well as the threat posed by predators (Predator Threat) at the relocation site cause the largest variation in the BN’s posterior distribution.

88

Table 5.1: Evidence sensitivity analysis for posterior network (protected fenced BN), showing calculated entropy Success - long term Success - site

0.1983 0.3555

Human 1.1922 Population size 1.0889 Population structure 1.0297 Predator threat 1.0097 Existing cheetahs 0.9700 Government support 0.8979 Reserve objectives 0.8979 Site specific risks 0.8547 Monitoring 0.8207 Community support 0.8018 Site size 0.8018 Existing cheetahs 0.6908 Possible expansion 0.6730 Metapopulation 0.6720 Ecological suitability 0.6559 Predator presence 0.6474 Relocation site size 0.5792 Site factors 0.5425 Genetic relatedness 0.4314 Release type 0.3669 Disease 0.3251 Environmental 0.3251 Neighbour support 0.1985 Site metapopulation 0.1985 Habitat type 0.0000 Predator proof fence 0.0000 Prey availability 0.0000 The two end points of the BN are in italics at the top of the table and are reference points for the other nodes

The other measure of evidence sensitivity is mutual information I(X,Y), which gives an indication of the effect that one random variable, X, has on another variable, Y, and is calculated as follows (Korb and Nicholson, 2004; Pollino et al., 2007b): I ( X ,Y )  H ( X )  H ( X / Y )

(2)

The mutual information results between the node representing the success of a relocation (Success – site) and the other factors in the protected fenced network are shown in Table 5.2. This table clearly shows that the factors at the site (Site Factors) have the largest effect. This node is a function of the presence of predator proof fencing for neighbouring properties (Predator proof fence), the extent of Monitoring of the released cheetahs, any inherent risks at the site (Site specific risks) and the Ecological suitability of the site. The latter is also calculated to have the next largest effect, followed closely by the Existing cheetah population at the site (Existing cheetahs). 89

Table 5.2: Mutual information between the target node (Success-site) of the Protected Fenced Relocation BN and the other variables Site factors Ecological suitability Existing cheetahs Predator threat Predator presence Relocation site size Population structure Population size Monitoring Site size Reserve objectives Release type Genetic relatedness Site specific risks Possible expansion Site metapopulation External support Disease Metapopulation Human Government support Environmental Community support Neighbour support Habitat type Prey availability Predator proof fence

0.1041 0.0657 0.0557 0.0274 0.0199 0.0175 0.0121 0.0113 0.0098 0.0030 0.0027 0.0019 0.0012 0.0010 0.0005 0.0002 0.0020 0.0001 0.0001 0.0001 0.0001 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000

Next the protected fenced BN was inspected for parameter sensitivity using one way sensitivity analysis where one of the parameters is varied (Predator presence) while keeping all the others fixed and then measuring the variation in the output parameter (Success – site) (Bednarski et al., 2004). To do this a sensitivity function is required for the output probability in terms of the x, parameter, being varied. This sensitivity function is defined in Eq. (3) below and is the quotient of two linear functions in the parameter being varied (van der Gaag et al., 2007).

( x   ) (3) ( x   ) where α, β, δ, and γ are constants built from the parameters which are fixed f ( x) 

90

The sensitivity value of the parameter x and the target probability can be obtained by taking the first derivative from the sensitivity (Laskey, 1995; van der Gaag et al., 2007) and is given by the following equation: (   ) f '( x )  / (4) ( x   )² Figure 5.7 below shows the sensitivity of the success of a relocation event (output probability) to variations in the values for no predator presence.

Figure 5.7: Parameter sensitivity graph showing the slope of change for high and low predator presence at the relocation site. The observed posterior probabilities for a successful relocation event (Success – site) are shown on the y-axis and the changes in conditional probabilities for predator presence are on the x-axis Similar to successful relocation at a site, the long-term cheetah population viability is also sensitive to changes in the presence of predators. Furthermore government support, community support and the existence of a metapopulation plan play an important role for long-term viability whereas for a single successful relocation, they were not particularly relevant. Importantly, the long term viability is most sensitive to the success of the individual relocation events. In the unfenced networks a successful relocation (Success – site) was still sensitive to Predator presence, but not the same extent as in fenced areas. In addition, support from the government (Government Support) and the wider community (Community Support) featured more prominently than was the case in fenced areas. The success of a relocation event is sensitive to changes in the distance of human settlements (Distance from settlements/farms) to the relocation site as well risk factors in the relocation area (Site specific risks) that could cause disease and injuries as well as other adverse effects on the relocated cheetahs. These site-specific risks include traditional healers, land claims, disease, old agricultural land and floods.

91

Figure 5.8: Predictive testing – case study 1 Figure 5.8 illustrates field testing of the network involving using a case study derived from the experts’ experience. The target nodes (Success – site and Success – long term) need to reflect expected predictive patterns and/or known outcomes from the case study. The experts then kept all states of the factors the same, except having predators present. The change in probability of success of the two critical events was noticeable with relocation success dropping from 32.56% to 10.19% and long-term viability dropping from 26.93% to 8.43%. This also endorses earlier findings from sensitivity analysis that Predator presence influences the probability of success in a very significant manner.

5.4 Discussion This study investigated the use of a Bayesian network model to integrate, structure, and clarify human expertise on a composite problem, the relocation of cheetahs in two southern African countries using an original heuristic method, an iterative development cycle (IBNDC). The expected advantages are the consolidation of the resulting overall BN implementation and a continuous improvement of the model with incoming expertise from new case studies. This approach has been developed using a combination of several existing BN types and suggests a new approach to implementing BNs in a multi-expert and multi-field domain. While Bayesian Networks are not a new approach to ecological modelling, deriving their structure is particularly difficult, as is populating them with data. We outline a more iterative way of doing so, that is conducive to ideas on adaptive management. The IBNDC complements the suggested three-level BN approach to modelling by Marcot et al. (2006) and focuses on the iterative nature of BN modelling. Essentially the IBNDC is always a work in progress with the first step in the iterative process checking whether the BN needs modifying in light of new expert knowledge and information. Therefore any version of the network is a snapshot of the most current expert knowledge and evidence available at that time and as new evidence and knowledge come to light, the BN model is continually revised and refined. Information on cheetah relocations is sparse, the benefits of various techniques are still being 92

investigated and new information on relocation events is continually becoming available especially with increased monitoring in place. For these reasons the relocation BN was ideally suited to the IBNDC process. Once the Bayesian network has successfully completed the IBNDC procedure using the available current expert knowledge and information, it can be employed as a management support tool. The areas into which the two countries are able to relocate problem cheetahs differ significantly and certain factors considered important in certain situations may be less important or totally irrelevant in other situations. Pinpointing the factors and subnetworks pertinent to all BNs was important to the understanding of the crucial factors in cheetah relocations and would be candidates for consideration in other predator relocations. Furthermore the relocation events were considered in the context of the wider metapopulation viability. Some factors central to the success of a relocation event may also play a significant role in the metapopulation viability. Particularly in South Africa, management of the metapopulation of cheetahs in small fences reserves is becoming a challenge. Due to the small size of populations inside fenced reserves, intensive management is required to prevent inbreeding and local overpopulation, and to ensure long term sustainability of the cheetah metapopulation. Although the cheetah relocation BN demonstrates an exposition of the IBNDC to cheetah conservation, relocation is just one option among a suite of tools used to resolve human-cheetah conflict in southern Africa. There are several possible applications of BNs in a conservation management support environment, such as  Calculating risk associated with management decisions  A tool for negotiation, for example when consulting with reserves which are suitable as relocation sites  Illustrating trade-offs between various relocation sites and reserves  Training tool to introduce newcomers to the management process of wild cheetah relocations. The IBNDC process prioritises future data collection as part of the iterative process thereby facilitating continuous improvement of this tool. While relocations can be successful in a specific reserve, they require intensive and expensive management to be viable in the long term. The emphasis should be on conserving cheetahs in situ, and as indicated above, in this situation the BN can be used as an effective negotiation tool with landowners and stakeholders. The IBNDC procedure can also be used for the development of other useful management tools to guide decision making in the management of the cheetah metapopulation.

Acknowledgements We wish to thank De Wildt Trust for their financial and practical support in hosting the main workshop. We especially appreciated the excellent cooking and great hospitality of Eloise and her staff at De Wildt Lodge. We also thank Mokolodi Nature Reserve for hosting the second workshop. For constructive comments on the manuscript we thank two anonymous reviewers. 93

5.5 References BCP, 2002. Botswana Country Profile – United Nations World Summit, Johannesburg, South Africa. http://www.un.org/esa/agenda21/natlinfo/wssd/botswana.pdf, retrieved 14 October 2007 Bednarski, M., Cholewa, W. and Frid, W., 2004. Identification of sensitivities in Bayesian networks. Engineering Applications of Artificial Intelligence, 17, 327-335. Borsuk, M.E., Reichert, P., Peter, A., Schager, E., Burkhardt-Holm, P., 2006. Assessing the decline of brown trout (Salmo trutta) in Swiss rivers using a Bayesian probability network. Ecological Modelling 192, 224244. Bosch, O. J. H., Ross, B. J. and Beeton, R. J. S., 2003. Integrating science and management through collaborative learning and better information management. Systems Research and Behavioural Science 20, 107118. Castelletti, A., Soncini-Sessa, R., 2007. Bayesian networks and participatory modelling in water resource management. Environmental Modelling and Software 22, 1075-1088. Crooks, K.R., Sanjayan, M.A., Doak, D.F., 1998. New Insights on Cheetah Conservation through Demographic Modeling. Conservation Biology 12, 889-895. Cummings, D.H.M., 1991. Developments in game ranching and wildlife utilisation in east and southern Africa, In Wildlife Production: Conservation and Sustainable Development. eds L.A. Renecker, R.J. Hudson, pp. 96-108. University of Alaska, Fairbanks. De Waal, A., Ritchey, T., 2007. Combining Morphological Analysis and Bayesian Networks for Strategic Decision Support. ORiON 23, 105121. Durant, S.M., 2000. Living with the enemy: avoidance of hyenas and lions by cheetahs in the Serengeti. Behavioural Ecology 11, 624-632. Durant S. M., Kelly M. J. and Caro T. M., 2004. Factors affecting life and death in Serengeti cheetahs: environment, age, and sociality. Behavioural Ecology 15, 11-22 Ghahramani Z., 2001. An Introduction to Hidden Markov Models and Bayesian Networks. International Journal of Pattern Recognition and Artificial Intelligence 15, 9-42 GCCAP, 2002. Global Cheetah Conservation Action Plan Workshop Report, In Global Cheetah Conservation Action Plan Workshop. p. 77, Shumba Valley Lodge, South Africa. Gusset, M., Slotow, R., Somers, M.J., 2006. Divided we fail: the importance of social integration for the re-introduction of endangered African wild dogs (Lycaon pictus). Journal of Zoology 270, 502-511. Hamilton, G.S., Fielding, F., Chiffings, A.W., Hart, B.T., Johnstone, R.W., Mengersen, K., 2007. Investigating the Use of a Bayesian Network to Model the Risk of Lyngbya majuscula Bloom Initiation in Deception Bay, Queensland. Human and Ecological Risk Assessment 13, 12711279.

94

Hayward, M.W., Adendorf, J., O’Brien, J., Sholto-Douglas, A., Bissett, C., Moolman, L.C., Bean, P., Fogarty, A., Howarth, D., Slater, R., Kerley, G.I.H., 2007. The re-introduction of large carnivores to the Eastern Cape, South Africa: an assessment. Oryx 41, 205-213. Hugin, 2007. Hugin API Reference Manual Version 6.7, http://www.hugin.com/developer/documentation IUCN, 2007. The IUCN Red List of Threatened Species. Jensen, F.V., Nielsen, T.D., 2007. Bayesian Networks and Decision Graphs. Springer Science + Business Media, LLC. Klein, R., 2007. Status Report for the Cheetah in Botswana, In The Status and Conservation Needs of the Cheetah in Southern Africa. eds C. Breitenmoser, S.M. Durant, pp. 14-21. CAT News Special Edition, December 2007. Korb, K.B., Nicholson, A.E., 2004. Bayesian Artificial Intelligence. Chapman & Hall/CRC, London. Koller D. and Pfeffer, A., 1997. Object-Oriented Bayesian Networks. In: Thirteenth Annual Conference on Uncertainty in Artificial Intelligence (UAI-97), (Providence, Rhode Island pp 302-13) Laskey, K.B., 1995. Sensitivity Analysis for Probability Assessments in Bayesian Networks. IEEE Transactions on Systems, Man and Cybernetics 25, 901-909. Laurenson, M.K., Wielebnowski, N., Caro, T.M., 1995. Extrinsic Factors and Juvenile Mortality in Cheetahs. Conservation Biology 9, 1329-1331. Lindsey, P.A., Alexander, R., Du Toit, J.T., Mills, M.G.L., 2005. The Cost Efficiency of Wild Dog Conservation in South Africa. 19, 1205-1214. Marcot, B.G., Steventon, J.D., Sutherland, G.D., McCann, R.K., 2006. Guidelines for developing and updating Bayesian belief networks applied to ecological modeling and conservation. Canadian Journal of Forest Research 36, 3063-3074. Marker, L., 2002. Aspects of Cheetah (Acinonyx jubatus) Biology, Ecology and Conservation Strategies on Namibian Farmlands. Page 516. Department of Zoology. University of Oxford, Oxford. Marker, L.L., Dickman, A.J., Jeo, R.M., Mills, M.G.L., MacDonald, D.W. 2003. Demography of the Namibian cheetah, Acinonyx jubatus jubatus. Biologival Conservation 114, 413-425 Marnewick, K., Beckhelling, A., Cilliers, D., Lane, E., Mills, G., Herring, K., Caldwell, P., Hall, R., Meintjes, S., 2007. The Status of the Cheetah in South Africa, In The Status and Conservation Needs of the Cheetah in Southern Africa. eds C. Breitenmoser, S.M. Durant, pp. 22-31. CAT News Special Edition, December 2007. Merola, M., 1994. A Reassessment of Homozygosity and the Case for Inbreeding Depression in the Cheetah, Acinonyx jubatus: Implications for Conservation. Conservation Biology 8, 961-971. Murphy, K.P., 2002. Dynamic Bayesian Networks: Representation, Inference and Learning, In Graduate Division. p. 225. University of California, Berkeley. Myers, N., 1975. The cheetah Acinonyx jubatus in Africa. Report of a survey in Africa from the Sahara southwards. IUCN/WWF joint project. IUCN Monograph No. 4, International Union for Conservation of Nature and Natural Resources, Morges. 95

Neapolitan, R., 1990. Probabilistic Reasoning in Expert Systems: Theory and Algorithms. John Wiley, New York. O’Brien, S.J., 1994. The Cheetah Conservation Controversy. Conservation Biology 8, 1153-1155. Park, M.H., Stenstrom, M.K., 2008. Classifying environmentally significant urban land uses with satellite imagery. Journal of Environmental Management 86, 181-192. Pearl, J., 1988. Networks of Plausible Inference, In Probabilistic Reasoning in Intelligent Systems. p. 543. Morgan Kaufmann Publishers Inc, San Francisco, California. Pollino, C.A., White, A.K., Hart, B.T., 2007a. Examination of conflicts and improved strategies for the management of an endangered Eucalypt species using Bayesian networks. Ecological Modelling 201, 37-59. Pollino, C.A., Woodberry, O., Nicholson, A., Korb, K., Hart, B.T., 2007b. Parameterisation and evaluation of a Bayesian network for use in an ecological risk assessment. Environmental Modelling & Software 22, 1140-1152. Purchase, G.K., Marker, L., Marnewick, K., Klein, R., Williams, S., 2007. Regional assessment of the status, distribution and conservation needs of cheetahs (Acinonyx jubatus) in southern Africa: Summary of Country Reports. CAT News, Special Edition 3, 43. Ross B. J. and Zuviria, E., 2007. Evolving dynamic Bayesian networks with multi-objective genetic algorithms. Applied Intelligence 26, 13-23. Smith, C.S., Howes, A.L., Price, B., McAlpine, C.A., 2007. Using a Bayesian belief network to predict suitable habitat of an endangered mammal The Julia Creek dunnart (Sminthopsis douglasi). Biological Conservation 139, 333-347. Taroni, F., Aitken, C., Garbolino, P., Biedermann, A., 2006. Bayesian Networks and Probabilistic Inference in Forensic Science. John Wiley & Sons, Ltd. Uusitalo, L. 2007. Advantages and challenges of Bayesian networks in environmental modelling. Ecological Modelling 203, 312-318. Van der Gaag, L.C., Renooij, S. and Coupe, V.M.H., 2007. Sensitivity Analysis of Probabilistic Networks Advances in Probabilistic Graphical Models. Springer Berlin / Heidelberg, pp. 103-124. Varis O. and Kuikka, S., 1999. Learning Bayesian decision analysis by doing: lessons from environmental and natural resources management. Ecological Modelling 119, 177-195. Weber P. and Jouffe, L., 2006. Complex system reliability modelling with Dynamic Object Oriented Bayesian Networks (DOOBN). Reliability Engineering & System Safety 91, 149-62.

96

5.6 Appendix A1

Table 5.3: Description of nodes of Protected Fenced Relocation BN (Fig. 5.6)Table 5.3: Description of nodes of Protected Fenced Relocation BN (Fig. 5.6) Node

Description

States

Community Support

Community and peer support: Farmer communities, conservation bodies, ngo's.

High, Low, None

Disease

Catastrophy according to veterinary department. Distemper, rabies, parvo virus if present in high numbers would affect the choice of using that area as a release site. Summary Node

Normal, Exceptional

Suitable, Not Suitable

Disasters - floods, drought, fire

Normal, Exceptional

Summary node for existing cheetah population at the site: Suitable/not suitable for introduction Summary node


The relatedness between existing cheetahs and the proposed introduced cheetahs Involves both legislation (for relocation) and implementation. The extent to which the government supports relocation activities. States: No - relocations are not permitted; Limited - some support with lobbying, lack of capacity, with non-applicable or outdated legislation, legislation present but not informed; Yes - good legislation and implementation, proactive, informed, capable Whether habitat is suitable for cheetah relocation

Yes, No, Unknown

Ecological suitability Environmental Existing cheetahs External Support Genetic relatedness Government Support

Habitat type Human

Neighbour Support

Human factors in conflict with cheetahs. Cheetahs will get in conflict when they: 1. eat livestock/game of value 2. Perceived belief of threat to people Participation of all the reserves in a country-wide network of fragmented reserves Direct or indirect monitoring(using GPS/satellite/VHF collars), indication of the level of monitoring in place Daily: daily visuals or locations of all the cats (you have a signal and the signal is moving). direct or indirect monitoring; Less frequent: anything less than daily Neighbouring Support for relocation by surrounding communities

Population size

The cheetah population size after relocation event

Population structure

Consider: male/female ratio, dominant coalition size, age of cheetahs currently in release site Possible expansion of relocation site? Presence of lions & spotted hyenas. Leopard presence taken as given. Predator proof fencing of neighbouring properties or release site

Metapopulation Monitoring

Possible expansion Predator presence Predator proof fence

Predator Threat

Good, Limited, None

Yes, Limited, No

Suitable, Not Suitable None, Settlement, Agricultural, Both Yes, No Daily, Less Frequent, Absent.

Yes, No Under Capacity, Within Capacity, Over Capacity, None Suitable, Adjustments, Not Suitable Yes, No Yes, No No - no predator proof fence, or not maintained; Yes - well maintained

Relocation site size

Current or future predator threat of lions and hyena. Leopards assumed present Suitable prey for cheetahs (Sufficient: self-sustaining for 2 years before reintroducing new prey) Whether hard or soft release Hard: released asap from capture to new area, Soft: spends time(few weeks) in enclosure in release site before released Summary node for site size

Adequate, Not Adequate

Reserve Objectives

Main focus of reserve(hunting, tourism, photography)

Conservation, Ecotourism, Hunting

Site factors

Summary node for factors affecting site


Site Metapopulation

Site-specific participation in the metapopulation plan or indication of willingness to participate Relocation site size min - 2000ha; small - 5000ha; medium - 5000 - 15000ha; large - > 15000ha Risk factors of relocation area: including current disease levels of residing predators/livestock, injury potential due to site conditions or other problems etc. Long-term viability of wild cheetah metapopulations, defined in accordance with the IUCN definition Successful relocation with respect to site. Survival of relocated cheetahs into fenced, protected areas. Individual survival 1. Capable of breeding 2. Interact socially 3. Hunt for themselves

Yes, No

Prey Availability Release type

Site size

Site specific risks

Success - long term Success - site

97

High, Low,No Sufficient, Not sufficient Hard, Soft.

Large, Medium, Small, Min

None, Low, High

Yes, No Yes, No

A2

CPTs for Protected Fenced Relocation BN (Fig. 5.6, Table 5.3)

98

99

100

A3

Motivation for nodes in Cheetah Relocation BNs

Target Nodes A study into the reproduction and survival of wild cheetah populations in the Serengeti concluded that the greatest influence on the growth rate of a wild cheetah population is the survival of the adult cheetahs, rather than the juveniles (Crooks et al., 1998). Consequently, successful relocation of captured adult cheetahs (Success – site) is important to preserve wild relocated cheetah populations (Success – long term).

Key Factors (Nodes) In the Serengeti predation by lion Panthera leo and spotted hyena Crocuta crocuta is the main cause of cheetah cub mortality. These predators may also rob adult cheetahs of their kill and lions have been known to kill adult cheetahs (Durant 2000). The estimated mortality rate of cheetah cubs in Serengeti National Park is 95%, with predation by lion and spotted hyena accounting for 73.2% of the deaths (Laurenson et al., 1995). The existence of hyenas and lions at a proposed relocation site (Predator presence) is therefore a major consideration in determining the suitability of the site for relocation and the successful outcome of the relocation event. Leopard Panthera pardus also poses a threat to cheetahs (Marnewick et al., 2007), but they were assumed to always be present at relocation sites and therefore no special attention was given to them in the predator threat assessment (Predator Threat). As mentioned in Section 5.1, in addition to this depredation by lions and hyenas, increased contact and conflict with humans (Humans), fragmented habitat and habitat site size (Habitat type) also pose major threats to cheetah survival (GCCAP, 2002). The lack of genetic diversity (Genetic relatedness) may also have serious implications for the survival of the species as it adversely affects reproduction (O’Brien, 1994) and may make the animals less adaptable to ecological changes (GCCAP, 2002). Although genetic relatedness and loss of habitat are both acknowledged to be important, their relative priority is still under debate (Merola, 1994; O’Brien, 1994). Adult male survival is substantially lower than adult females, but increases for those in male coalitions (Durant et al., 2004). The survival rate depends on the size of the coalition as well as the number of male coalitions in that area. Durant et al., (2004) observed that adolescent groups had a better chance of survival than singleton cheetah adolescents and that males in adolescent groups relied heavily on their sisters to provide food for the group. For these reasons the existing cheetah population size (Population size) and structure (Population structure) at the relocation site, and the group makeup of the cheetahs to be relocated are important considerations.

101

A4

Relocation into protected unfenced areas

In contrast with the network in Fig. 5.6, survival of a relocated cheetah in a

Figure 5.9: Conceptual network for relocation into protected unfenced areas showing the node groupings at the end of the Core Process. The nodes were assigned to six groups; Area Characteristics (green), Existing population (light blue), Human Factors (pink), External Support (yellow), Direct Factors (light green), Survival (orange).

protected unfenced area (Fig. 5.9) was much more dependent on human factors (Human). As a consequence, this node was linked directly to the target node (Success – site). Human factors (Human) represent the measure of conflict between humans and cheetahs caused primarily by cheetahs eating (or suspected of eating) livestock or game of value, or because they are perceived to pose a threat to people. The degree of conflict is influenced by the distance of the relocation site from a settlement (Distance from settlements/farms), the level of livestock/game protection (Livestock/Game Protect) and community support in these settlements (Neighbouring support), as well as the type of protection in that area (Reserve Type). The reserve type may be a wildlife management area (WMA), typically present on the boundaries of national parks, or a strict conservation area such as a game reserve or national park. In WMAs there may be human settlements and hunting may be permitted, whereas in game reserves and national parks such activities are forbidden. Survival was also directly influenced by whether the animal was monitored by the reserve or conservation organisation (Monitoring). Community support for a relocated cheetah in a neighbouring reserve (Neighbouring support) is strongly influenced by the nature and degree of education activities of the local community in the form of school visits, farmer workshops and community forums (Site Education). These education activities are largely introduced and undertaken by conservation organisations. Support for cheetah relocation by the neighbouring community was more likely if the relocated cheetahs were being monitored. 102

In protected unfenced areas, the relocation site has no fence to contain wildlife and hence the suitability of the relocation site no longer depends on its size (Relocation site size). Furthermore the degree of Genetic Relatedness was removed as a factor in assessing the suitability of the Existing cheetah population. Instead, its suitability was dependent on whether the site is within or outside the Home range of the animal, and the other cheetahs that had been relocated there recently (Release history). Since a metapopulation plan is not relevant for unfenced sites, the second target node of long-term population viability (Success – long term) was dependent on just three factors: short-term survival of the relocated cheetah (Success - site), the Existing cheetah population at the site and External Support.

A5

Relocation into unprotected unfenced areas

South Africa only relocates into protected areas and therefore does not regard relocation into unprotected areas as a viable option. However, these areas may be considered for relocation in Botswana since the majority of suitable relocation sites would be unfenced. This network, which is shown in Fig. 5.10,

Figure 5.10: Conceptual network for relocation into unprotected unfenced areas showing the node groupings at the end of the Core Process. The nodes were assigned to six groups; Area Characteristics (green), Existing population (light blue), Human Factors (pink), External Support (yellow), Direct Factors (light green), Survival (orange). replicates that constructed for relocation into protected unfenced areas, with two main exceptions. First, the Reserve type is no longer a factor. Second, although Predator threat is still a consideration in the Ecological suitability of a site, the existence of predators (Predator presence) in these areas is no longer deemed to have an effect on predator threat. This is due mainly to the absence of fences which translates to lower density of predators. 103

A6

In situ relocation

Figure 5.11 illustrates the conceptual network for an in situ relocation and as can be seen from the diagram, the BN that describes this relocation event is very simple. Only the first target node is considered (Success - site), since if the cheetah is left in situ, the long-term population viability of the cheetah population (Success – long term) is not affected. There is no consideration of Site factors or the Existing cheetah population, but instead support by the government (Government Support), community (Community Support) and the individual landowner (Ind Landowner Support) play important roles. Illegal Trade is also a major consideration. Both community and individual landowner support are influenced by the Level of conflict with cheetahs in the area, existing livestock protection practices (Livestock Protection), education about cheetahs (Education) and whether the cheetah is monitored by the conservation project or program (Monitoring).

Figure 5.11: Conceptual network for in situ relocation showing only two groups, Survival (orange) and the other nodes belong to a default grouping (light yellow)

A7

Ecological suitability

Figure 5.12: Ecological suitability subnetworks for relocation on protected fenced sites (top), protected unfenced sites (left) and unprotected unfenced sites (right). 104

Figure 5.12 shows the subnetworks corresponding to the ecological suitability of a site for cheetah relocation. For all events, a site was ecologically unsuitable (Ecological suitability) if Habitat type or availability of prey (Prey availability) was unsuitable. For protected, fenced areas a site was also unsuitable if the Site size was unsuitable. Predator Threat was the dominant factor in all cases. Table 5.4 is an illustration of the conditional probability table for Predator Threat. The first numerical value (0.99) can be interpreted as follows: Given that the site size is minimum and predators are present, the probability for a high predator threat is 0.99. These are the prior probabilities for Predator Threat. When the BN is compiled and run we obtain the posterior probabilities obtained from the priors and the effects of the directed links (Varis and Kuikka, 1999).

Table 5.4: Conditional Probability Table for Predator Threat in Protected Fenced Relocation BN Site Size Predator Presence high low no

A8

min yes 0.99 0.01 0

no 0 0 1

small yes 0.95 0.05 0

no 0 0 1

medium yes no 0.75 0 0.25 0 0 1

large yes 0.6 0.3 0.1

Site Factors

The site factor subnetworks for a relocation event are shown in Fig. 5.11 below. Ecological suitability (the result of the subnetwork in Fig. 5.10) and Site specific risks contributed to the appropriateness of the location as a relocation site (Site factors), regardless of whether the site was protected, unprotected,

Figure 5.7: Site factors subnetworks for relocation on protected fenced sites (top), protected unfenced sites (left) and unprotected unfenced sites (right). fenced or unfenced. For fenced relocation sites, the presence of Monitoring and predator proof fencing at neighbouring properties (Predator proof fence) were thought to affect the suitability of the site (site factors), whereas for 105

no 0 0 1

unfenced sites the size of the Home range of the Existing cheetah population is relevant. The existence and extent of Monitoring of the released cheetahs are important for fenced protected areas, whereas the Release type has to be taken into account for protected unfenced areas. Releasing into unfenced unprotected areas is usually difficult and monitoring is unlikely to be in place.

106

Chapter 6 Viability of the free-ranging cheetah population in Namibia - an Object Oriented Bayesian Network Approach In this chapter, which has been written as a journal article4, we illustrate the exposition of the IBNDC heuristic, which was introduced in chapter 5, to model the viability of the free roaming cheetah population in north-central Namibia. We describe the development of an integrated OOBN comprising three subnetworks, each focussing on a different aspect affecting the population: anthropology, habitat ecology and cheetah biology. We demonstrate the facilitation of object oriented BN modelling through the use of the IBNDC heuristic enabling the parallel development of three self-contained, yet integrated subnetworks. Finally several scenarios are proposed and their predicted impact on the probability of the viability of the free roaming cheetah population and other key factors in the integrated OOBN are presented and discussed.

4

The article has been submitted to Journal of Animal Ecology and is under review.

Viability of the free-ranging cheetah population in Namibia - an Object Oriented Bayesian Network Approach 1*

2

1

2,5

3

2

Sandra Johnson , Laurie Marker , Kerrie Mengersen , Chris H. Gordon , Jörg Melzheimer , Anne Schmidt-Küntzel , 2 2 4 3 Matti Nghikembua , Fabiano Ezequiel , Josephine Henghali , Bettina Wachter 1

Queensland University of Technology, GPO Box 2434, Brisbane, QLD 4001, Australia Cheetah Conservation Fund, Otjiwarongo, Namibia 3 Leibniz Institute for Zoo and Wildlife Research, Berlin, Germany 4 Ministry of Environment and Tourism, P.O. Box 13306, Windhoek, Namibia 5 Present address: Wildlife Conservation Research Unit, Department of Zoology, University of Oxford, Oxford, UK 2

107

Viability of the free-ranging cheetah population in Namibia - an Object Oriented Bayesian Network Approach Abstract Conservation of free-ranging cheetah (Acinonyx jubatus) populations is multi faceted and needs to be addressed from an ecological, biological and management perspective. There is a wealth of published research, each focussing on a particular aspect of cheetah conservation. Identifying the most important factors, making sense of various (and sometimes contrasting) findings, and taking decisions when little or no empirical data is available, are everyday challenges facing the conservationist. Bayesian Networks (BN) provide a statistical modelling framework that enables analysis and integration of information addressing different aspects of conservation. There has been an increased interest in the use of BNs to model conservation issues, however the development of more sophisticated BNs, utilising object oriented (OO) features, is still at the frontier of ecological research. Our paper describes an integrated, parallel modelling process followed during a BN Modelling workshop held at Cheetah Conservation Fund in Namibia to combine expert knowledge and data about free-ranging cheetahs. The aim of the workshop was to obtain a more comprehensive view of the viability of the free-ranging cheetah population in Namibia and hence more effectively target those areas having the greatest impact on population viability. The BN was therefore developed by aggregating diverse perspectives from local fieldworkers, agents from the national ministry, conservation agency members and independent scientists. This integrated BN approach facilitates OO modelling in a multi-expert context which lends itself to a series of integrated, yet independent, subnetworks describing different scientific and management components. We created three subnetworks in parallel: a biological, ecological and human factors network, which were then combined to create a complete representation of free-ranging cheetah population viability. Lessons learnt from this workshop have widespread relevance to the effective and targeted conservation management of other vulnerable and endangered species.

108

6.1

Introduction

Over the past century, free-ranging cheetahs (Acinonyx jubatus) have undergone a drastic reduction in both global geographic range and population size, leaving Namibia as one of the remaining strongholds for the species (Marker et al., 2007). The current global free-ranging cheetah population is estimated at less than 12,000 individuals, with the majority of cheetahs being found outside protected areas (Purchase et al., 2007). It is hard to reliably monitor population trends across the country and to derive accurate estimates of population size. However, there is general agreement that the minimum number of cheetahs nationwide is around 2000, with an upper boundary estimated at above 5000 animals (Hanssen and Stander, 2004). Data from farmers, through questionnaires, surveys and sightings reports, suggest that cheetah populations in Namibia could be increasing, although there are no independent data to substantiate this (Marker et al., 2007).However, the steady increase in human population size resulted in an increase in livestock numbers and valuable trophy species, which intensified the conflict between humans and cheetahs. Effective management and maintenance of a healthy cheetah population in Namibia is considered critical for cheetah conservation worldwide (Woodroffe et al., 2007) and knowledge gained from this population could prove invaluable for cheetah conservation and management in other range countries (Marker et al., 2007). The sustained free-ranging cheetah population in Namibia was the focus of a Bayesian network (BN) modelling workshop at Cheetah Conservation Fund (CCF) in Namibia in June 2008, bringing together specialists from CCF, the Leibniz Institute for Zoo and Wildlife Research (IZW) and the Ministry of Environment and Tourism (MET). Bayesian Networks (BNs) are ideally suited for interactive, integrated modelling. A BN is a mathematical model (Pearl 1988; Neapolitan, 1990; Jensen and Nielsen 2007) providing a graphical representation of key factors and interactions for an outcome of interest (Jensen and Nielsen, 2007; Uusitalo, 2007) such as the viability of a free-ranging cheetah population. The key factors are represented as nodes in the diagram and their dependencies on other key factors and the outcome of interest (target node) are depicted as directed links to form a directed acyclic graph (DAG) (Lauritzen and Sheehan, 2003). Underlying each node is a conditional probability table that is determined by the states of the node and its parent nodes (see Table 6.1 in Methods section). The data represented in the network may originate from diverse sources such as empirical data, expert opinion and simulation outputs (Pearl, 2000; Jensen and Nielsen, 2007). The BN provides probabilities for each node, including the target node, given the factors influencing them, their interactions and conditional probabilities (Pearl, 2000; Jensen and Nielsen, 2007). Bayesian networks are growing in popularity in environmental disciplines (Uusitalo, 2007) but the development of more sophisticated BNs, utilising dynamic and object oriented (OO) features is still at the frontier of ecological research, where the available data may be sparse and the underlying biological and physical models very complex. Object oriented BNs (OOBN) 109

are suited to dealing with large complex domains when traditional BNs are often inadequate (Koller and Pfeffer, 1997; Uusitalo, 2007). Moreover, environmental problems may be modelled by a series of integrated subnetworks describing different scientific components, combining scientific and management perspectives, or pooling similar contributions developed in different locations by different research groups. The input and output nodes (interface nodes) in OOBNs are the gateway to connectivity with other OOBNs (Jensen and Nielsen, 2007) facilitating the construction of complex and dynamic models (Koller and Pfeffer, 1997). This paper models the current threats facing Namibian cheetahs using the reservoir of data and expert opinion available and discusses possible strategies for addressing these threats to ensure the long-term conservation of this valuable population. The application of OOBN modelling techniques presents a novel approach, enabling the parallel concurrent development of inter-related, yet self-contained networks by specialist expert teams. These subnetworks are then merged to form a comprehensive model of the factors that influence the free-ranging cheetah population growth and decline in Namibia. This development approach for constructing interconnected BNs is eminently transferrable to modelling the issues facing conservation and management of other species of interest or concern.

6.2

Methods

Study Area Free-ranging cheetahs are spread throughout Namibia, with the highest density occurring in northern and central Namibia (Fig.6.1). This study uses data and information from this section of the population to model the viability of the free-ranging cheetah population.

Figure 6.1: Map of Namibia showing the density of the cheetah population in Namibia (Marker, 2002)

110

The model developed at this workshop will be the basis for a Namibian National Cheetah Planning workshop planned in Namibia for 2009.

OOBN Modelling We used the Iterative Bayesian Network Development Cycle (IBNDC) approach which was trialled at a cheetah relocation BN modelling workshop in South Africa (Johnson et al., unpublished). The IBNDC is divided into two parts, a Core Process and an Iterative Process. During the Core Process the target node was carefully defined by the expert teams. Thereafter the key factors believed to affect the target node were identified, defined and grouped into logical, coherent groups which were allocated to subnetworks. The workshop participants split into groups in accordance with the subnetwork which best suited their expert knowledge For the Iterative Process we used Hugin® software because it is conducive to OO modelling which is required for the IBNDC heuristic. The nodes (key factors) were reviewed and dependencies represented as arrows (directed links) between the nodes (key factors) in the DAG to illustrate the direction of the relationship between them. The final activity for this first iteration was specifying the interfaces. An interface for a subnetwork consists of input nodes (placeholder node) and output nodes (visible to other OOBN subnetworks, Fig. 6.2). The interfaces enable the creation of integrated, yet separate, networks, by allowing the flow of information through these input and output nodes. The expert teams identified the nodes that were of interest to more than one subnetwork. The OOBN subnetwork that ‘owned’ the node would create it as an ‘output node’ to made it visible to the other subnetworks, and the OOBN subnetworks interested in receiving the node would create an ‘input node’ as a placeholder for that node (Jensen and Nielsen, 2007). The states of the node were decided by the team creating the output node and the other team ensured that their input node exactly matched those states. The teams were able to work on the structure and content of their subnetworks without impacting on any other team, since the interface connections dictate the means of communication with the other OOBN subnetworks. This is an important concept in OO modelling known as encapsulation, or information hiding (Grady, 1994; Pastor et al., 2001). The IBNDC Iterative Process consists of four phases, Phase 1R (define), Phase 2R (quantify), Phase 3R (validate) and Phase 4R (evaluate). For each team, Phase 1R involved the careful definition and documentation of the nodes and their interactions. During Phase 2R the nodes were quantified. This encompassed agreeing on how the node could be measured and what information was available to determine the probability distribution of the node, the definition of the possible states of the nodes and their thresholds, and populating the conditional probability table (CPT) for the node given the different combinations of states of the parent nodes (Table 6.1).

111

Table 6.1: Conditional probability table (CPT) of Female mate choice node with states increase and decrease and parent nodes Cheetah removal, Intraspecific density and Immigration/emigration Cheetah removal Intraspecific density Immigration/emigration

higher gain loss

decrease medium gain loss

lower gain loss

higher gain loss

increase medium gain loss

lower gain loss

increase decrease

0.8 0.2

0.7 0.3

0.6 0.4

0.55 0.45

0.45 0.55

0.35 0.65

0.75 0.25

0.65 0.35

0.55 0.45

0.5 0.5

0.4 0.6

In Phase 3R the OOBN subnetworks were validated by compiling and running the BN, and then checking to see whether the predictions were consistent with known behaviour and whether the BN respected known causal relationships. The testing was primarily done using expert knowledge to interpret the observed behaviour of the OOBN. If there were inconsistencies, this could be due to either an error in the entered data (evidence), an error in one of the CPTs or in the directed links between the nodes. Inconsistent behaviour necessitated the reassessment of nodes, states and probabilities which were addressed in the next iteration. At the start of every day, the subnetworks were reviewed by the other teams as a ‘sanity check’ and evaluation of the OOBN subnetwork. These are typical Phase 4R activities, which then initiated the next iteration of the IBNDC to process the changes resulting from the evaluation. Once the subnetworks were in a stable condition, they were merged into the overall network and the final set of iterations was performed on the entire OOBN. This involved evidence and parameter (probability) sensitivity analysis of the combined network, as well as scenario testing.

Scenario Testing A key feature of BN modelling is inference, which enables us to draw conclusions about the outcome of interest (Bednarski et al., 2004) which is here the viability of the free-ranging cheetah population in north-central Namibia. A BN is often used to answer questions about the conditional distribution of the target variable based on values of specific variables in the BN (Laskey, 1995). Evidence is entered into the network to represent various scenarios of interest and this evidence is propagated through the BN resulting in changed probabilities for the states of the hypothesis variable (target node). Therefore it is important to ensure that we understand the sensitivity of the BN to variations in the evidence (observed values of nodes) and parameters (probability values of CPTs) in the network. Some parameters may have been elicited from experts, which may be biased and based on intuition rather than real data, necessitated by the lack of available data or the nature of the information (Bednarski et al., 2004). However, many parameters do not require great precision and expert opinions are ideally suited to estimate them, but it is critical for the authenticity of the BN inference to increase the accuracy of those parameters which may have a more profound effect on the hypothesis variable. We can view them as the ‘weak points’ in the network 112

0.3 0.7

(Bednarski et al., 2004), since if these probabilities are inaccurate or incorrect they can result in false or misleading conclusions and predictions.

Sensitivity Analysis Sensitivity analysis of a BN is therefore a vital part of the evaluation process. It involves the assessment of the sensitivity of the target node to variations in the evidence entered into the network (evidence sensitivity) and to variations in the values of the parameters (parameter sensitivity) (Varis and Kuikka, 1999; Bednarski et al., 2004; Pollino et al., 2007b). Evidence sensitivity measures the degree of variation in the BN’s posterior distribution resulting from changes in the evidence and assists the expert team in targeting future data collection and identifying any errors in the BN structure or CPTs (Jensen and Nielsen, 2007; Pollino et al., 2007a). Two popular ways in which to measure evidence sensitivity are entropy and mutual information (Pollino et al., 2007a). Entropy measures the randomness of a variable (key factor) (Pearl, 1988; Korb and Nicholson, 2004; Kjaerulff and Madsen, 2007) and the higher the value the more random the variable. Mutual information gives an indication of the extent to which the joint probability of two variables differs from what it would have been if they were independent (Korb and Nicholson, 2004; Pollino et al., 2007b). Therefore a value of 0 for mutual information means that the key factors are in fact independent (Pearl 1988; Kjaerulff and Madsen, 2007). Using parameter sensitivity we can identify those parameters which cause the biggest changes in the posterior probabilities of the outcome of interest. Efforts are then directed to improve the level of accuracy for those parameters (Pollino et al., 2007b) and to channel expert elicitation efforts (van der Gaag et al., 2007). One way sensitivity analysis is done by varying one of the parameters while keeping all the others fixed and then measuring the variation in the output parameter (Bednarski et al., 2004).

6.3

Results

Core Process The target node was defined by the expert team as the viability of the freeranging cheetah population in north-central Namibia. The factors believed to affect the sustainability of this cheetah population readily separated into three coherent groups: ecological, biological and human factors.

113

Iterative Process The final set of interface nodes are shown in Fig. 6.2. The input nodes have a broken line and the output nodes a solid line

Figure 6. 2: Interface nodes for the three OOBN subnetworks. The Human OOBN subnetwork has five output nodes: Human Population Growth, Cheetah removal, Human Habitat Impact, Prey poaching and Land use. The Ecological OOBN subnetwork has four input nodes: Cheetah removal, Human Habitat Impact, Prey Poaching, Land use and two output nodes: Prey availability, Intraspecific competition. The Biological OOBN subnetwork has four input nodes: Human population growth, Cheetah removal, Prey availability and Intraspecific density and three output nodes: Recruitment, Immigration-emigration and Mortality which all feed into the target node of the combined network, Cheetah population viability Integrated OOBN While biological factors have a direct effect on the viability of the cheetah population, they are in turn under strong influence from human and ecological factors. Ecological factors have a significant effect on the biology of the cheetah (Genetic and Health) through Prey availability and Intraspecific density, especially if population size drops to a critical level. The ecological factors are in turn affected by human impact (Human Habitat Impact, Prey poaching, Land use and Cheetah removal). Excessive human pressures would affect preferred cheetah habitat, prey abundance and cheetah densities. Direct effects of Human population growth on biological factors were considered to be minor, however they will affect the viability of the wild cheetah population indirectly through Human habitat impact and Prey poaching. Cheetah removal has a direct impact on Mortality, but also influences Female mate choice and Genetic variation.

114

Human Factors OOBN The human factors OOBN has thirteen nodes and is shown in Fig. 6.3. The interface consists of four output nodes and no input nodes. The output nodes Human population growth and Cheetah removal are accessed by the biological subnetwork and Land use, Prey poaching, Human Habitat Impact and Cheetah removal by the ecological factors OOBN. The motivation for the selection of these nodes is available in the appendix.

Figure 6.3: The thirteen nodes of the Human Factors OOBN subnetwork. The output nodes have a double line and are: Cheetah removal, Land use, Human Population Growth, Human Habitat Impact, Prey poaching Ecological Factors OOBN Figure 6.4 depicts the ecological factors subnetwork containing eleven nodes, of which six form the interface with other OOBN subnetworks. There are four input nodes from the human factors OOBN and two output nodes which are accessed by the biological OOBN. While ecological factors can have a significant effect on those living within that ecosystem, it is the pressures from other factors that most affect the ecological balance. The expert team determined that human factors (Human Habitat Impact, Prey poaching, Land use and Cheetah removal) would have the greatest impact on the ecology of the cheetah. Excessive human pressures would affect preferred cheetah habitat (Vegetation structure and Available space), prey abundance (Prey availability) and cheetah densities (Intraspecific density). The ecological factors would in turn affect the biology of the cheetah (Prey poaching and Intraspecific Density).

115

Figure 6.4: The final version of the Ecological Factors subnetwork showing the input nodes Human Habitat Impact, Prey poaching, Land Use and Cheetah removal from the Human factor OOBN, and the output nodes Prey poaching and Intraspecific Density.

The ecological OOBN was designed around the two core ecological factors: access to food and competition. The latter is being distinguished into intraspecific (Intraspecific Density) and intraguild competition (Intraguild Density). Further detail on the background to this OOBN subnetwork is available in the appendix to this manuscript. Biological Factors OOBN The biological OOBN subnetwork, shown in Fig 6.5, contains eleven key nodes with the interface consisting of four input nodes: Human population growth and Cheetah removal from the human factors OOBN and Intraspecific density and Prey availability from the ecological factors OOBN; as well as three output nodes: Immigration-emigration, Mortality and Recruitment. The biological factors were designed around Health, Genetics and behaviour (Female mate choice).

Figure 6.5: The Biological Factors subnetwork showing the input nodes Human population growth and Cheetah removal from the Human factor OOBN and Intraspecific density and Prey availability from the Ecological factor OOBN, and the 116 output Cheetah population viability.

All biological factors feed into Mortality and Recruitment to reflect that these two factors, in addition to Immigration/emigration, define the viability trend of a population. Additional information leading to the construction of this subnetwork is available in the appendix. Sensitivity Analysis Table 6.2 shows the calculated entropy for the variables in the combined OOBN. The larger the value, the more random the probability distribution of the variable, meaning that the largest possible value is attained for a uniform distribution (Kjaerulff and Madsen, 2007). Table 6.2: Entropy values for the nodes in the combined OOBN. The entropy value for the target node (Free-ranging cheetah population viability) is shown in italics at the top of the table as a reference for the values of the other nodes. The entropy can be considered as a measure of how ‘uninformative’ a variable is. Therefore the larger the value, the more random the distribution (Kjaerulff and Madsen, 2007) Free-ranging cheetah population viability

0.692

Legislation implementation Land use Environmental education Local community awareness Farmer education Social impacts Plant biomass production Cheetah removal Human habitat impact Health Livestock & wildlife management Recruitment Prey poaching Genetic Female mate choice Human population growth Stress Mortality Immigration-emigration Vegetation structure Rain Intraguild density Intraspecific density Available space Prey availability Human cheetah conflict Economic benefits

0.303 0.489 0.500 0.579 0.611 0.636 0.642 0.653 0.662 0.663 0.675 0.682 0.689 0.691 0.692 0.692 0.693 0.693 0.693 0.866 0.867 0.927 0.929 0.975 1.066 1.079 1.098

The other measure of evidence sensitivity is mutual information, which is listed in Table 6.3. The mutual information between the target node, Freeranging cheetah population viability, and the other nodes in the OOBN is representative of the amount of information shared between the target node and each of the variables. The greater the value the bigger the amount of 117

information shared. It follows that Mortality and Recruitment share the most information with Free-ranging cheetah population, followed by Health and Prey availability. On the other hand Free-ranging cheetah population appears to be independent of Environmental Education and Human population growth. Table 6.3: Mutual information between the hypothesis variable (Free-ranging cheetah population viability) and each of the variables listed in the table. They are ordered in descending order so that those variables sharing most information with the target node are at the top of the list (Kjaerulff and Madsen, 2007) Mortality Recruitment Health Prey availability Plant biomass production Immigration-emigration Cheetah removal Rain Genetic Female mate choice Economic benefits Intraspecific density Vegetation structure Human cheetah conflict Farmer education Land use Stress Human habitat impact Intraguild density Social impacts Legislation implementation Available space Livestock & wildlife management Prey poaching Local community awareness Environmental education Human population growth

0.0774 0.0759 0.0253 0.0243 0.0137 0.0051 0.0037 0.0031 0.0022 0.0011 0.0006 0.0005 0.0002 0.0002 0.0001 0.0001 0.0001 0.0001 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0 0

Scenario Testing When the combined BN was run, the probability that the free-ranging cheetah population in north-central Namibia would be viable was reported as 52.4%. Thus, if the current level of conservation activities continues and everything else remains largely unchanged, then the free-ranging population in northcentral Namibia would be viable. However, the slender margin between being viable and not being viable means that the population continues to be under threat, especially if the status quo changes due to unforeseen circumstances or if some of the undesirable trends persist or increase. The experts were interested in exploring the effect that several different scenarios may have on the viability of this population. The posterior probabilities for the nodes of interest are shown in Fig. 6.6 prior to any evidence being added to the OOBN. 118

Figure 6.6: Posterior probabilities of the combined OOBN, with the target node, Free-ranging cheetah population viability, in the top left corner of the figure showing the probability of 52.4% of being viable and 47.6% of declining.

These values therefore formed the basis for comparison of the predicted probabilities from the scenarios. The following five scenarios were proposed by the expert team and tested on the combined OOBN model: 1. Cheetah Removal from farmlands ceases The farmers are known to be interested in how much the cheetah population would grow if they stop shooting them (farmers of the Seeis Conservancy, pers. comm). Although the BN cannot predict the increase in cheetah population numbers, entering evidence into the network to represent this scenario results in the probability of a viable population increasing to 58.1% (percentage increase of 10.9%). This is a promising outlook for the sustainability of the free-ranging cheetah population and therefore a worthwhile strategy to target a change in farmer attitudes. This scenario is likely to occur alongside substantially higher economic benefits to farmers, increasing from 34.6% to 56.9% (percentage increase of 64.5%) and increased farmer education from 30.0% to 39.0% (percentage increase of 30.0%). If removal of cheetahs ceased, a percentage decrease of 33.6% in mortality is expected along with a predicted trend for genetic variability to increase over time (52.4% probability that genetic variability will increase). Furthermore, the expected increase in cheetah population is predicted to result in only a small drop in prey abundance (percentage decrease of 1.6%). 2. Increased farmer and environmental education One strategy to combat free-ranging cheetah population decline is to target both farmer and environmental education, especially in those areas which fall into the free-ranging population’s home ranges. If the network is updated with the ideal situation of 100% coverage of environmental and farmer education, the probability of a viable free-ranging cheetah population increases only marginally to 53.5%, a percentage increase of a mere 2.9%. However, 119

education is generally accepted to be a long-term strategy and there are profound positive trends in other nodes shown to have a significant impact on the viability of the cheetah population, such as cheetah removal. The probability that cheetah removal will decrease improves by 33.2% (from 35.9% to 47.8%) and in the previous scenario the effect of this variable on the target node was substantial. Furthermore, a large reduction in human cheetah conflict was observed (percentage improvement of 64.7%, from 37.1% for low conflict to 61.1%) and a very substantial increase in wildlife and livestock management (percentage increase of 60.5%, from 40.5% to 65.0%). Another positive outcome was the percentage change of 72.5% in human habitat impact (from 37.5% positive to 64.7%), which is known to be one of the major contributors to the worldwide decline in free-ranging cheetah populations (GCCAP, 2002). This suggests that the cheetah population viability is set to increase further over time due to the positive trends in several influential variables. 3. Increased prey abundance Ungulates migrate to water points as a means of survival and because farmers are creating more permanent water holes, water is freely available. Consequently the ungulates roam near the water points and do not need to migrate. This increases the survival rate of the offspring and populations build up around these water points (Gunther Roeber, CCF Farmer Education Coordinator, pers. comm). When evidence of prey abundance was entered in the OOBN, a substantial increase in the cheetah population to 59.8% was observed. This farmer activity is creating an artificial prey abundance which would usually be accompanied by higher rainfall (percentage increase in high rainfall of 36.3%) and consequently a greatly improved plant biomass production (percentage increase of 51.4%). Nonetheless the model predicts a positive impact on the viability of the free-ranging cheetah population without any anticipated increase in human cheetah conflict. However because this increase is not accompanied by the expected increase in plant biomass production and rainfall, which would be the case had the prey abundance occurred naturally, there may be negative impacts on habitat and other factors which have not been modelled here. 4. Climate change Namibia has periodic drought cycles which are accompanied by decreased prey availability and less tolerance of predators by farmers (Marker et al., 2007). Climate change in Namibia is expected to see an increase in temperature and a reduction in rainfall (Thuiller et al., 2006). By entering evidence of low rainfall into the OOBN, the cheetah population is predicted to decline (probability of population being viable is 46.6%). If we include further evidence of insufficient plant biomass as a consequence of climate change, we observe an even more dramatic fall to 40.9%, which bodes ill for the freeranging cheetah population. This represents a percentage decrease of 21.9% on the original probability of population viability and is accompanied by insufficient prey (percentage increase in probability of insufficient prey of 179.8%) and a decline in cheetah population health (percentage increase in health deterioration of 73.8%).

120

5. Disease outbreak Although the cheetah experts felt that a disease outbreak would be highly unlikely in the free-ranging cheetah population, they were interested in modelling this scenario. Evidence of decreased health of the free-ranging cheetah population was entered into the OOBN which had a devastating effect on the viability of this population, causing it to drop to 38.0%. It would be prudent therefore for conservation organisations and research institutions such as CCF and IZW to continue to closely monitor the health of this freeranging cheetah population to pre-empt any possible health issues and if evidence of a disease outbreak is detected, then early intervention would be possible, preventing any deletrious consequences.

6.4

Discussion

The IBNDC approach guides BN model design within an object oriented framework and is therefore conducive to the creation of integrated networks. This study demonstrated the successful application of this heuristic at the CCF BN modelling workshop where three independent networks were constructed, each focussing on a specific aspect of the free-ranging cheetah population in north-central Namibia, while allowing information flow through the three interfaces defined by the expert team. Each interface isolated the inner workings of one subnetwork from the other subnetworks and therefore, providing the interface was honoured, changes could be made without affecting another OOBN. This facilitated the parallel development of three subnetworks (human, biological and ecological) for subsequent integration into a combined OOBN. This feature is particularly appealing when modelling environmental issues of concern which involves several distinct domain experts. The IBNDC approach makes efficient use of their time by allowing them to work concurrently, yet independently, and then exchanging knowledge and performing cross validation, evaluation and scenario testing on the integrated network. Furthermore, the continual development cycle of the IBNDC is appealing and relevant to multi-disciplined ecological issues such as species conservation enabling the perpetual refinement and development of integrated OOBNs as new data and knowledge comes to light. The integrated OOBN constructed for the free-ranging wild cheetah population in north-central Namibia represents the current expert knowledge and data for this population. The scenario testing confirmed observed trends and suggested increased focus on health monitoring, changing farmer perceptions and continued efforts in farmer and environmental education. This OOBN can be viewed as a ‘work in progress’ to be adapted and refined as new data is collected and to represent the latest research on this free-ranging cheetah population. Although the OOBN structure and quantification for the cheetah population in Namibia would have to be altered depending on individual circumstances, lessons learned through this study have widespread applications in other places where conservation on private land is critical to the maintenance of viable populations of large carnivores and in those areas most critical for future cheetah conservation. 121

Acknowledgements We wish to thank Cheetah Conservation Fund for hosting the workshop and for the excellent hospitality. We also wish to thank Anne Marie Stewart and Burton Gaiseb for their contributions. We acknowledge the financial support provided by the Australian Research Council’s International Linkage Grant. For helpful comments on farmer practices we thank Gunther Roeber.

6.5

Literature Cited

Bednarski M, Cholewa W and Frid W 2004 Identification of sensitivities in Bayesian networks Engineering Applications of Artificial Intelligence 17 327-335 Caro T M 1994 Cheetahs of the Serengeti plains: group living in an asocial species: (University of Chicago Press) CITES 1992 Quotas for trade in specimens of cheetah. In: Eighth Meeting of the Convention of International Trade in Endangered Species of Wild Fauna and Flora, (Geneva pp 1-5 de Klerk J N 2004 Bush encroachment in Namibia (Windhoek: Solitaire Press) Durant S M 2000 Living with the enemy: avoidance of hyenas and lions by cheetahs in the Serengeti Behavioural Ecology 11 624-632 Fabiano E C 2007 Evaluation of Spoor Tracking to Monitor Cheetah Abundance in Central Northern Namibia. (Pietermaritzburg: University of KwaZulu-Natal) GCCAP 2002 Global Cheetah Action Plan Review final workshop report. In: Global Cheetah Conservation Action Plan - Workshop, ed P Bartels, et al. (Shumba Valley Lodge, South Africa: Apple Valley, MN: IUCN / SSC Conservation Breeding Specialist Group) p 78 Gottelli D, Wang J, Bashir S and Durant S M 2007 Genetic analysis reveals promiscuity among female cheetahs Proceedings of the Royal Society of London Series B 274 1993-2001 Grady B 1994 Object-oriented analysis and design with applications, second edition (Redwood City, California: Benjamin/Cummings Pub. Co.) Hanssen L and Stander P 2004 Namibia Large Carnivore Atlas. Predator Conservation Trust, Windhoek, Namibia (http://www.predatorconservation.com/atlas%20project.htm, accessed 5 July 2009) Harvey P H and Bradbury J W 1991 Sexual Selection, Behavioural Ecology, an evolutionary approach, ed J R Krebs and N B Davies: Blackwell Scientific Publications) Jensen F V and Nielsen T D 2007 Bayesian Networks and Decision Graphs: Springer Science + Business Media, LLC)

122

Kauffman M J, Sanjayan M, Lowenstein J, Nelson A, Jeo R M and Crooks K R 2007 Remote camera-trap methods and analyses reveal impacts of rangeland management on Namibian carnivore communities Oryx 41 70-78 Kjaerulff U B and Madsen A L 2007 Bayesian networks and influence diagrams (New York: Springer) Koller D and Pfeffer A 1997 Object-Oriented Bayesian Networks. In: Thirteenth Annual Conference on Uncertainty in Artificial Intelligence (UAI-97), (Providence, Rhode Island pp 302-313 Korb K B and Nicholson A E 2004 Bayesian Artificial Intelligence (Boca Raton: Chapman & Hall/CRC) Laskey K B 1995 Sensitivity Analysis for Probability Assessments in Bayesian Networks IEEE Transactions on Systems, Man and Cybernetics 25 901-909 Laurenson M K 1994 High juvenile mortality in cheetahs (Acinonyx jubatus) and its consequences for maternal care Journal of Zoology 234 387408 Laurenson M K, Caro T M and Borner M 1992 Female cheetah reproduction National Geographic Research & Exploration 8 64-75 Laurenson M K, Wielebnowski N and Caro T M 1995 Extrinsic Factors and Juvenile Mortality in Cheetahs Conservation Biology 9 1329-1331 Lauritzen S L and Sheehan N A 2003 Graphical Models for Genetic Analyses Statistical Science 18 489-514 Marker-Kraus L, Kraus D, Barnett D and Hurlbut S 1996 Cheetah Survival on Namibian Farmlands. (Windhoek: Cheetah Conservation Fund) Marker L 2002 Aspects of Cheetah (Acinonyx jubatus) Biology, Ecology and Conservation Strategies on Namibian Farmlands. In: Department of Zoology, (Oxford: University of Oxford) p 516 Marker L, Dickman A, Wilkinson C, Schumann B and Fabiano E 2007 The Namibian Cheetah: Status Report CAT News, Special Edition 3 4-13 Marker L, Dickman A J, Jeo R M, Mills M G L and Macdonald D W 2003a Demography of the Namibian cheetah Biological Conservation 114 413-425 Marker L, Dickman A J, Mills M G L, Jeo R M and Macdonald D W 2008a Factors influencing the spatial distribution of cheetahs (Acinonyx jubatus) on north-central Namibian farmlands Journal of Zoology 274 226-238 Marker L, Mills M G L and Macdonald D W 2003b Aspects of the management of cheetahs trapped on Nambian farmlands Biological Conservation 114 401-412 Marker L L and Dickman A J 2003 Morphology, physical condition and growth of the cheetah (Acinonyx jubatus jubatus) Journal of Mammalogy 84 840-850 123

Marker L L, Fabiano E C and Nghikembua M 2008b The Use of Remote Camera Traps to Estimate Density of Free-ranging cheetahs in NorthCentral Namibia CAT News 49 22-24 Marker L L, Muntifering J R, Dickman A J, Mills M G L and Macdonald D W 2002 Quantifying prey preferences of free-ranging Namibian cheetahs South African Journal of Wildlife Research 33 43-53 Meldelsohn J and el Obeid S 2005 Forest and Woodlands of Namibia (Windhoek, Namibia: Raison) Menotti-Raymond M and O'Brien S J 1993 Dating the genetic bottleneck of the African cheetah Proceedings of the National Academy of Sciences of the United States of America 90 3172-3176 Munson L, Terio K A, Worley M, Jago M, Bagot-Smith A and Marker L 2005 Extrinsic factors significantly affect patterns of disease in free-ranging and captive cheetahs (Acinonyx jubatus) populations Journal of Wildlife Diseases 41 542-548 Muntifering J R, Dickman A J, Perlow L M, Hruska T, Ryan P G, Marker L and Jeo R M 2006 Managing the matrix for large carnivores: a novel approach and perspective from cheetah (Acinonyx jubatus) habitat suitability modelling. Animal Conservation 9 103-112 National Planning Commission 2004 Namibia Vision 2030: Policy Framework for Long-Term National Development. (Office of the President: Windhoek, Namibia) Neapolitan R E 1990 Probabilistic Reasoning in Expert System Applications (New York: Wiley) Nghikembua M T 2008 Quantifying farmers’ perceptions and willingness; as well as availability of encroaching aboveground Acacia bush biomass on CCF commercial farmlands in north central Namibia, Mini dissertation research report. In: Faculty of Economics and Management Sciences: Centre for Development Support, (Bloemfontein, South Africa: University of the Free State) O'Brien S J, Roelke M E, Marker L, Newman A, Winkler C A, Meltzer D, Colly L, Evermann J F, Bush M and Wildt D E 1985 Genetic basis for species vulnerability in the cheetah Science 227 1428-1434 O'Brien S J, Wildt D E, Gildman D, Merril C R and Bush M 1983 The cheetah is depauperate in genetic variation Science 221 459-462 Pastor O, Gomez J, Insfran E and Pelechano V 2001 The OO-Method approach for information systems modeling: from object-oriented conceptual modeling to automated programming Information Systems 26 507-534 Pearl J 1988 Probabilistic Reasoning in Intelligent Systems (San Francisco, California: Morgan Kaufmann Publishers Inc) Pearl J 2000 Causality : models, reasoning, and inference (Cambridge: Cambridge University Press)

124

Pollino C A, White A K and Hart B T 2007a Examination of conflicts and improved strategies for the management of an endangered Eucalypt species using Bayesian networks Ecological Modelling 201 37-59 Pollino C A, Woodberry O, Nicholson A, Korb K and Hart B T 2007b Parameterisation and evaluation of a Bayesian network for use in an ecological risk assessment Environmental Modelling & Software 22 1140-1152 Purchase G K, Marker L, Marnewick K, Klein R and Williams S 2007 Regional assessment of the status, distribution and conservation needs of cheetahs (Acinonyx jubatus) in southern Africa: Summary of Country Reports CAT News, Special Edition 3 43 Thalwitzer S 2007 Reproductive activity in cheetah females, cub survival and health in male and female cheetahs on Naminbian farmland. (Free University Berlin) Thuiller W, Midgley G F, Hughes G O, Bomhard B, Drew G, Rutherford M C and Woodward F I 2006 Endemic species and ecosystem sensitivity to climate change in Namibia Global Change Biology 12 759-776 Uusitalo L 2007 Advantages and challenges of Bayesian networks in environmental modelling Ecological Modelling 203 312-318 van der Gaag L C, Renooij S and Coupe V M H 2007 Sensitivity Analysis of Probabilistic Networks Advances in Probabilistic Graphical Models, (Berlin / Heidelberg: Springer ) pp 103-124 Varis O and Kuikka S 1999 Learning Bayesian decision analysis by doing: lessons from environmental and natural resources management Ecological Modelling 119 177-195 Woodroffe R, Frank L G, Lindsey P A, ole Ranah S M K and Romañach S 2007 Livestock husbandry as a tool for carnivore conservation in Africa’s community rangelands: a case–control study Biodiversity and Conservation 16 1245–1260

125

6.6

Appendix

6.6.1 Background to Human subnetwork Namibia is sparsely populated (National Planning Commission, 2004), its climate and unreliable rainfall have created a fragile environment with limited productivity. An increase in population puts further pressure on limited natural resources to satisfy basic human needs such as food, shelter, health, and employment (Human Population Growth). Commercial farmlands have higher wildlife densities and species richness, lower human population density and are managed for both livestock and wildlife, whereas the opposite is true for communal land. High livestock densities compounded by ineffective livestock and veld management practices result in productive wildlife habitat becoming overgrazed and bush encroached (de Klerk, 2004). About 90% of the cheetah population is found outside protected areas (Marker, 2002) where commercial and communal farming is practiced (Meldelsohn and el Obeid, 2005). Moreover, Kauffman et al. (2007) detected changes in carnivore communities adjacent to communal and commercial farmlands in Namibia. Their study shed more light on the effects of anthropogenic actions and differing management regimes on natural resources outside of protected areas (Livestock and wildlife management). With this heightened potential for interaction between farmers and cheetahs, there is increased focus on Farmer education which is considered crucial to long term conservation strategies. Developing education and training programs that promote an integrated approach to livestock, wildlife and predator management, are vital to improving basic understanding of farm production principles such as animal health, vaccinations, breeding, financial management, rangeland management, and predator conflict prevention. More generally, Environmental Education has been identified as a priority for cheetah conservation throughout Africa. Given that human predator conflict is multi-faceted, there is a need to create a knowledge based society. However a lack of trained professionals and broad public awareness threaten long term sustainability of conservation efforts. Changes in Land use patterns from commercial to communal systems, e.g. where large tracks of land is subdivided for resettlement, are likely to have significant effects on wildlife population density and species diversity. Ownership of resources and the land tenure systems, especially on communal lands, also influence the manner in which people perceive natural resources. As such, it is believed that communal resources would continue to be exploited, unless people are empowered with the responsibility to manage the resources on a sustainable basis. Land use categories identified were commercial farmlands, communal land and state owned land. Land tenure systems play a key role in determining the outcomes and implementation of management decisions (Muntifering et al., 2006; Nghikembua, 2008) (Local community awareness).

126

Human predator conflict is a major issue for predator conservation (Woodroffe et al., 2007). Considerable research has been conducted on Namibian cheetahs and has shown that this population has been reduced in numbers in the past century primarily due to perceived and actual livestock and game loss leading to Human cheetah conflict (Marker et al., 2002; Marker et al., 2003a; Marker et al., 2003b). The biggest threats to cheetahs are land development resulting in loss of suitable habitat and prey availability, indiscriminate removals and killing (Cheetah removal). Commercial and communal farmlands support an integrated livestock and wildlife based economy. Productive rangelands provide a direct economic incentive to farmers, therefore stocking rates and livestock densities are raised, especially in good rainfall seasons. Economic benefits may be derived from both agriculture and conservation enterprises. Good economic profits raise positive tolerance, attitudes and perception towards predators (Marker, 2002) whereas poor economic benefits may compel the removal of carnivores perceived as threats. This is especially the case in areas with no conservancies where benefits from wildlife resources are limited due to user rights. The human team believed that increased economic benefit may also have a negative effect as farmers then stand to lose more due to game or livestock loss to cheetahs. The Legislation type includes acts, policies, and approved protocols to prevent the decline of an endangered species, protect suitable habitat and promote coexistence with farmers and includes those administered by the Ministry of Agriculture, Ministry of Environment and Tourism, and Ministry of Land and Resettlement. The existence of policies is an important first step in wildlife conservation, but they are ineffective without enforcement (Legislation implementation). Furthermore, major wildlife policies affecting cheetah survival do exist, however some are considered insufficient or outdated. An example is the 1975 Nature Conservation Ordinance which is widely quoted. Currently there are bans on captive breeding of cheetah even though this could help maintain genetic diversity. Land resettlement policies could have a direct impact on habitat suitability for the free-ranging cheetah population, especially where individual farms are reduced to smaller units, thereby increasing the risk of rangeland degradation and low productivity. Human habitat impact on cheetah population viability may be incidental, purposeful or unintentional and is dependent on the effects of four key factors: Land use, Human Population Growth, Farmer Education and Legislation Implementation.

6.6.2 Background to Ecological subnetwork Namibian cheetahs are known to show prey selection for native game species (Marker et al., 2002). However, Prey Availability is depended on numerous factors such as habitat parameters as well as climatic factors. Besides these rather intrinsic factors the legal and illegal removal of potential prey animals (Prey poaching) due to human activities such as poaching and game harvesting (this being for biltong, meat and trophy hunting) plays an important role in the availability of the cheetah’s prey. Additionally, anthropogenic

127

habitat changes influence prey densities indirectly mainly due to overgrazing which leads to bush encroachment (de Klerk, 2004) (Human habitat impact). Land use and vegetation structure would also affect the amount of available space for cheetahs and the carrying capacity of that area. Ecological research revealed that Namibian cheetahs prefer habitat patches with grassy cover and high visibility (Muntifering et al., 2006). Intraguild density and available space affect Intraspecific density of cheetahs. A higher Intraspecific density of cheetahs would lower the Prey availability in the area. In addition, the type of Land use affects the Intraguild density or numbers of larger predators (e.g. lions and spotted hyenas specifically) that would compete with cheetahs in three ways: a) reducing the amount of prey available, b) scavenging carcasses from cheetahs and c) killing cheetah cubs (Laurenson et al., 1995; Durant, 2000). On-going census research provide an indication of cheetah densities through a variety of methodologies (Fabiano, 2007; Marker et al., 2008b). Another very important factor leading from the human factors network into the ecological network is the direct removal of cheetahs by humans (Cheetah removal). Both legal (trophy hunting, reported removal of problem animals) and illegal removal of cheetahs will have a substantial effect on the density and abundance of cheetahs. The level of removals of cheetahs within Namibia is poorly understood. While the quota of 150 cheetahs per annum for trophy hunting (CITES, 1992) is almost never met, the levels of illegal removal remain largely unknown.

6.6.3 Background to Biological subnetwork Females of a large number of species choose sires of their offspring in a nonrandom way, selecting males with traits signalling male quality (Harvey and Bradbury, 1991). In the Serengeti National Park in Tanzania, female cheetahs seem to prefer sires holding a territory compared to males roaming in large home ranges (Gottelli et al., 2007). Territorial males are in better physical condition than non-territorial males and males strongly compete for territories (Caro, 1994). In Namibia, such competing areas may be areas that include clusters of marking trees, since males that are frequently found in such areas are in better physical condition than males using large home ranges (Melzheimer and Wachter, unpublished data). For this study, it was assumed that Namibian cheetah females preferably choose males that are regularly found in areas with marking tree clusters, and the fluctuation of such males was considered to be crucial for Female mate choice (the mate choice exercised by females from the available number of males within her range). Fluctuation of males ranging around marking tree clusters was considered to be influenced by three factors, including Cheetah removal (input from human factors OOBN), Intraspecific density (input from ecological factors OOBN) and Immigration-emigration (a factor representing the quantitative difference between immigrating and emigrating cheetahs) catering for the spatial distribution of free-ranging cheetahs shown to have extensive home ranges (Marker et al., 2008a). Cheetah removal on commercial farmland usually 128

occurs at marking trees because marking trees are highly attractive to cheetahs and hence the likelihood to capture a cheetah is high at marking trees. The sex-ratio of cheetahs removed by this technique is heavily biased towards males (approximately 95% males vs. 5% females) (Marker-Kraus et al., 1996). As a result, large numbers of potential sires, including males favoured by females are removed. This leads to a higher turnover of attractive males and thus an increased number of different males available to females to select from in a given time period. This scenario was considered to increase opportunities for females to exercise mate choice. Cheetah removal was assumed also to decrease the genetic variability of the cheetah population by causing a reduction in population size (Recruitment represents male and female cheetahs surviving to adulthood), and to increase Mortality (the number of adult male and female cheetahs dying). Furthermore, increased opportunities for females to exercise mate choice was assumed to increase genetic variability, since females choose different partners for each litter and often also within litters (Gottelli et al., 2007). It was not known how many partners were available in a Namibian cheetah female’s range, but it was assumed that a higher turnover of males leads to increased partner changes. Thus, from the Genetic node point of view, there was a negative, direct effect from the Cheetah removal node and a positive, indirect effect from the Female mate choice node. For the model the positive effect was assessed to be slightly stronger than the negative effect. Cheetahs are known for their low genetic variability (O'Brien et al., 1983; O'Brien et al., 1985; Menotti-Raymond and O'Brien, 1993), however, there is no evidence that this negatively affects the reproduction and Health (general health regarding diseases, injuries, genetic defects and energy turnover) of free-ranging cheetahs (Laurenson et al., 1992; Laurenson, 1994; Marker and Dickman, 2003; Munson et al., 2005; Thalwitzer, 2007), and show regional genetic differences (Marker et al., 2008b). Thus, the ‘normal’ low genetic variability was considered to not cause any genetic defects through deleterious alleles in the gene pool or any critically low functional genetic diversity. A decrease in this species-specific low genetic variability due to changes from incoming nodes in the biological factor subnet, however, was assumed to eventually lead to genetic defects if the negative effects are strong. The Genetic node was therefore defined as the genetic variability of the cheetah population (which can either increase or decrease) and Stress as disturbance of the homeostasis due to aversive stimuli.

129

Chapter 7 A Bayesian Network approach to modelling temporal behaviour of Lyngbya majuscula bloom initiation This chapter has been written as a conference paper, for which I am first author. The paper is co-authored by Professor Kerrie Mengersen. The paper has been peer reviewed and accepted for presentation at the 18th World IMACS Congress and MODSIM09 International Congress on Modelling and Simulation in Cairns, Australia in July 2009.

A Bayesian Network approach to modelling temporal behaviour of Lyngbya majuscula bloom initiation Johnson, S.1and K. Mengersen1 1

School of Mathematical Sciences, Queensland University of Technology, Brisbane, Queensland

The study applies the IBNDC heuristic to the annual review of the Lyngbya Science BN, developed during the Lyngbya Research and Management Program (2005-2007), and consequently the ensuing modifications to this network. The application of the IBNDC methodology is illustrated in the extended version of the paper, which is in preparation for submission to the Journal of Applied Ecology, but is not described in the conference paper due to paper length constraints. The IBNDC heuristic is applied within an OO framework to represent the temporal behaviour of Lyngbya bloom initiation using both subclassing and reuse characteristics of OO modelling techniques. The static Lyngbya Science BN is thus transformed into an OOBN enabling the creation of monthly time slices. This chapter highlights the key role that expert elicitation plays in defining the structure of a BN, as well as its quantification and verification. Expert elicitation continues to play a vital role during the review and redevelopment of the model.

130

A Bayesian Network approach to modelling temporal behaviour of Lyngbya majuscula bloom initiation Abstract Lyngbya majuscula is a cyanobacterium (blue-green algae) occurring naturally in tropical and subtropical coastal areas worldwide. Deception Bay, in Northern Moreton Bay, Queensland, has a history of Lyngbya blooms, and forms a case study for this investigation. The South East Queensland (SEQ) Healthy Waterways Partnership, collaboration between government, industry, research and the community, was formed to address issues affecting the health of the river catchments and waterways of South East Queensland. The Partnership coordinated the Lyngbya Research and Management Program (2005-2007) which culminated in a Coastal Algal Blooms (CAB) Action Plan for harmful and nuisance algal blooms, such as Lyngbya majuscula. This first phase of the project was predominantly of a scientific nature and also facilitated the collection of additional data to better understand Lyngbya blooms. The second phase of this project, SEQ Healthy Waterways Strategy 2007-2012, is now underway to implement the CAB Action Plan and as such is more management focussed. As part of the first phase of the project, a Science model for the initiation of a Lyngbya bloom was built using Bayesian Networks (BN). The structure of the Science Bayesian Network was built by the Lyngbya Science Working Group (LSWG) which was drawn from diverse disciplines. The BN was then quantified with annual data and expert knowledge. Scenario testing confirmed the expected temporal nature of bloom initiation and it was recommended that the next version of the BN be extended to take this into account. Elicitation for this BN thus occurred at three levels: design, quantification and verification. The first level involved construction of the conceptual model itself, definition of the nodes within the model and identification of sources of information to quantify the nodes. The second level included elicitation of expert opinion and representation of this information in a form suitable for inclusion in the BN. The third and final level concerned the specification of scenarios used to verify the model. The second phase of the project provides the opportunity to update the network with the newly collected detailed data obtained during the previous phase of the project. Specifically the temporal nature of Lyngbya blooms is of interest. Management efforts need to be directed to the most vulnerable periods to bloom initiation in the Bay. To model the temporal aspects of Lyngbya we are using Object Oriented Bayesian networks (OOBN) to create ‘time slices’ for each of the periods of interest during the summer. OOBNs provide a framework to simplify knowledge representation and facilitate reuse of nodes and network fragments. An OOBN is more hierarchical than a 131

traditional BN with any sub-network able to contain other sub-networks. Connectivity between OOBNs is an important feature and allows information flow between the time slices. This study demonstrates more sophisticated use of expert information within Bayesian networks, which combine expert knowledge with data (categorized using expert-defined thresholds) within an expert-defined model structure. Based on the results from the verification process the experts are able to target areas requiring greater precision and those exhibiting temporal behaviour. The time slices incorporate the data for that time period for each of the temporal nodes (instead of using the annual data from the previous static Science BN) and include lag effects to allow the effect from one time slice to flow to the next time slice. We demonstrate a concurrent steady increase in the probability of initiation of a Lyngbya bloom and conclude that the inclusion of temporal aspects in the BN model is consistent with the perceptions of Lyngbya behaviour held by the stakeholders. This extended model provides a more accurate representation of the increased risk of algal blooms in the summer months and show that the opinions elicited to inform a static BN can be readily extended to a dynamic OOBN, providing more comprehensive information for decision makers. Keywords: Algal bloom, Lyngbya majuscula, Bayesian Belief Network, BN, Object Oriented, OOBN

132

7.1

Introduction

The presence and intensity of Lyngbya blooms have been officially recorded by the EPA since 2000 (Hodgkinson and Cox, 2005). Figure 7.1, which is based on this data, shows a box plot of the monthly intensity of Lyngbya blooms in Deception Bay from January 2000 to December 2005.

3.0

Bloom Intensity

2.5 2.0 1.5 1.0 0.5 0.0 n Ja

ua

ry b Fe

ar ru

y M

ar

ch

Ap

ri l

M

ay

n Ju

e

Ju

ly

gu Au

st Se

em pt

be

r Oc

b to

er N

em ov

be

r D

e ec

m

be

r

Month

Figure 7.1: Box plot of intensity of Lyngbya blooms in Deception Bay from January 2000 to December 2005 From the box plot it is apparent that the initiation of a Lyngbya bloom is of increasing concern with the onset of summer. For this reason it was recommended at the annual Lyngbya Science BN Review meeting (1 December 2008) to more accurately represent the temporal nature of Lyngbya blooms. The original structure of the BN was elicited from the experts in the Lyngbya Science Working Group during the first phase of the Lyngbya Research and Management Program (Hamilton et al., 2007b). The BN structure was also reviewed at the annual BN Review meeting and the resulting modified model is shown in Figure 7.2. The temporal nodes of interest were identified as Temperature, Rainfall, No of Previous Dry Days, Surface Light, Wind Speed, and Wind Direction. These nodes are depicted in the model below as ‘input’ nodes (shown as an eclipse with a broken line) which means that they act as place holders for the actual data for the months of September to March.

133

Figure 7.2: Lyngbya Science BN as a generic OOBN showing the input nodes with broken lines (Hugin®)

Hugin® software is popular to model BNs, easy to use and has good object oriented modelling capabilities. Figure 7.2 shows the Lyngbya Science BN as an Object Oriented Bayesian Network (OOBN) which enables it to be used as a ‘blue print’ for the seven months of interest. In other words it is a generic network which can be used (instantiated via an instance node) in each of the monthly networks. The monthly data is then added for each of the six input nodes to create the OOBN specific for that month.

134

7.2

Temporal Node Quantification

Every node in a BN has a conditional probability table (CPT) associated with it. The temporal nodes considered here were populated using the DPI Forestry’s Meteorological Records at Beerburrum containing daily readings from November 1999 to November 2005.

7.2.1 Rainfall The Rainfall node has three expert-defined states: Low (0-5mm/day), Medium (625mm/day) and High (25+mm/day). The resulting CPT table which was used to quantify the Rainfall node for each of the seven monthly BNs is shown in Table 7.1. It gives the number of days which were recorded as having low, medium or high rainfall as a percentage of the total number of days for that month (averaged over six years of data). Figure 7.3 clearly shows

Table 7.1: Monthly Rainfall (%) Sep Oct Nov Dec Jan Feb Mar

Medium

High

97.2 86.0 81.7 77.4 86.0 81.2 81.2

2.8 10.8 12.2 13.4 10.8 10.0 15.6

0.0 3.2 6.1 9.1 3.2 8.8 3.2

Low Rainfall Medium Rainfall High Rainfall

30 Mean no. of days per month

Low

25 20 15 10 5 0 il y h y ar uar arc Apr u r n M Ja Feb

r r r r st ly be obe be be Ju gu u m m m t A pte Oc ce ve No De Se Month

ay une M J

Figure 7.3: Mean Rainfall patterns (Nov 1999 to Oct 2005) that the average number of days with low rainfall is greater than medium or high rainfall days throughout the year. However in the seven months of interest to Lyngbya bloom initiation, and also in April, the average number of days having medium and high rainfall is more than in other months. From the Lyngbya Science OOBN in Figure 7.2 we can see that this in turn would have an effect on the Groundwater Amount (water stored below the earth’s

135

surface), Land Run-off (overland flow of water) and Air Load (nutrient load from Aeolian sources).

7.2.2 Number of Previous Dry Days Here expert opinion defined the node, No of prev dry days, as the cumulative number of days where rainfall is less than 5mm. Expert opinion also determined the states and thresholds for this node as follows: For any day in the month, if the number of previous dry days was 2 or less, it was assigned a value of Low; Medium indicates 3 to 5 previous dry days and High more than 6 days having less than 5mm rain. The CPT to populate this node for each of the seven months is given in Table 7.2.

Table 7.2: Previous number of dry days (%) Sep Oct Nov Dec Jan Feb Mar

Low 8.9 31.2 38.9 43.0 30.1 38.8 44.1

Medium 8.9 17.2 24.4 19.4 19.4 19.4 23.7

High 82.2 51.6 36.7 37.6 50.5 41.8 32.3

Figure 7.4 illustrates the combination of mean number of days with low, medium and high previous dry days for every month. As expected, since No of Prev Dry Day s Low Medium High

25

Prev Dry Days

20 15 10 5 0 h y y ar ar c ar nu br u M a J Fe

r Ap

il

M

ay

ne Ju

ly Ju

r st er er er be g u m b to b m b m u e A ce O c ove pt De N Se

Month

Figure 7.4: Mean number of days which had low, medium and high previous dry days (Nov 1999 to Oct 2005) Deception Bay is in a summer rainfall area, the graph shows the winter months of June to August having on average the most number of days with a high number of previous dry days. For the seven months being modelled for Lyngbya bloom initiation, however, there tend to be a more even distribution between the number of days with low, medium and high previous dry days.

136

7.2.3 Temperature The LWSG defined the Temperature node to be Table 7.3: CPT for Minimum the temperature of the water column. It has only Temperature (%) two expert-defined states, Low or High. A study Low High by Hamilton et al., (2007a) evaluating several Sep 98.3 1.7 alternative models, found that a model including Oct 86.6 13.4 only the minimum monthly temperature had the Nov 72.7 27.3 best predictive behaviour for Lyngbya bloom Dec 34.8 65.2 initiation. Furthermore air temperature was used Jan 18.8 81.2 as an acceptable approximation for water Feb 22.9 77.1 temperature, only differing typically by Mar 25.8 74.2 approximately one degree. Therefore the CPT for this node (Table 7.3) used the daily minimum air temperature recorded at Beerburrum. Expert opinion set 17oC as the cutoff point, so that if the minimum daily temperature was more than 17oC it was recorded as a high minimum temperature, otherwise it was classed as low. This cut-off can be varied at a subsequent annual review meeting if there are scientific reasons to differentiate between the states relative to another temperature.

35

Low Min Temp (days)

30 25 20 15 10 5 0 J

r ua an

y F

ar ru eb

y M

ch ar

Ap

r il

M

ay

ne Ju

ly Ju

g Au

t us Se

em pt

be

r

to Oc

be

r m ve No

be

r m ce De

be

r

Month

Figure 7.5: Box plot of the number of days with a low minimum temperature (Jan 2000 to Oct 2005)

Figure 7.5 is a box plot of the daily minimum air temperatures grouped by month for the six years. The means and medians are generally approximately equal, but the observations appear heavily skewed to the right in February and somewhat skewed in October and November. Nonetheless there appears to be a definite trend for the number of days having low minimum temperatures to start increasing from March and then more rapidly to peak in the winter months, before slowly declining from September and more quickly from October.

137

7.2.4 Wind Direction The Wind Direction node was defined by the Table 7.4: CPT for Wind Direction (%) LSWG to represent the measured course of the wind, relative to the compass. Expert North SE Other opinion nominated only two states to be Sep 41.7 28.9 29.4 relevant to Lyngbya bloom initiation: southeast Oct 52.7 28.5 18.8 (SE) and north (N). All other directions were Nov 43.3 41.1 15.6 grouped as Other. The wind direction at Dec 43.0 35.5 21.5 Beerburrum is recorded three times daily (9am, Jan 30.6 53.8 15.6 noon or 3pm). Watkinson et al. (2005) noted Feb 20.0 51.8 28.3 that during Lyngbya bloom initiation there were Mar 18.3 53.8 28.0 north to northeast (NE) winds. Therefore based on expert guidance, the daily wind direction was designated as N if any of the recorded wind directions for the day was N, NNE or NE. In a similar vein if any reading during the day recorded a SE wind, the daily value was taken to be SE. Figure 7.6 shows the composition of the mean number of days per month for wind directions of N, SE and Other, and Table 7.4 shows the percentage allocations for each of the three directions for the months of September to

Mean no. of days per month

20

WindDirection North SE Other

15

10

5

0 y y h ar uar ar c r nu M b a J Fe

r il Ap

ay M

ne Ju

ly Ju

r st er ber ber be gu mb m to m c e e e Au c v O pt De No Se

Month

Figure 7.6: Mean number of days per month for N, SE and Other wind direction (Nov 1999 to Oct 2005)

March. Watkinson et al. (2005) took wind direction and wind speed from observations made at noon in contrast to the way it was determined here. The flexibility exists to alter the program code if the next review recommends the use of only the noon reading.

138

7.2.5 Wind Speed The Wind Speed node represents the rate at Table 7. 5: CPT for Wind Speed (%) which the wind travels over the surface of the water and has only two expert-defined states, Low High High and Low. Using the Beaufort wind force Sep 80.3 19.7 scale, the wind speed reading for the day was Oct 91.9 8.1 assigned a value of Low if the average wind Nov 89.4 10.6 force was less than 3 and High if it was 3 or Dec 91.8 8.2 more. Table 7.5 shows the resulting CPT table Jan 87.5 12.5 for this node showing the percentage number of Feb 95.3 4.7 days in that month that were deemed to have a Mar 91.4 8.6 high or low wind speed, based on records from November 1999 to October 2005. The daily average wind speed was calculated using all the daily recorded readings (9am, noon and 3pm) if they were present. This can be changed to the noon reading only, as was done by Watkinson et al. (2005). However the average speed appeared to be a good 9

High Wind Speed (days)

8 7 6 5 4 3 2 1 0

J

r ua an

y F

r eb

ry ua

ch ar M

ril Ap

M

ay

ne Ju

ly Ju

r r r st er be be be gu ob u m t m m c e e e A v c O pt No De Se

Month

Figure 7.7: Box plot of the number of days with a high average wind speed (Nov 1999 to Oct 2005)

representation of the wind speed for the day and furthermore enables us to use more days of data when the noon reading was missing, but the 9am or 3pm reading was available. Figure 7.7 shows the box plot of the daily readings which were classified as having high wind speed, grouped by month.

139

7.2.6 Surface Light At the annual BN review meeting in December 2008, the Surface Light node was added to the Table 7.6: CPT for Surface Light (%) Lyngbya Science network and defined to Adequate Inadequate represent the total available photosynthetically Sep 88.9 11.1 active radiation (PAR) light as measured above Oct 67.9 32.1 the water surface. Populating this node Nov 67.2 32.8 necessitated elicitation of expert opinion to not Dec 59.7 40.3 only define the states and thresholds, but to also Jan 64.6 35.4 propose an alternative measurement to the Feb 56.2 43.8 preferred calculation for surface light as specified Mar 63.1 36.9 by Watkinson et al. (2005). The data from their study only spaned a few months and was therefore not suitable for a monthly probability distribution. Consequently we needed to find an alternative measurement which would provide a good approximation for this node. This is a well known and acceptable practice in BN modelling (Borsuk et al., 2006). To this end the cloud cover readings from

Adequate Surface Light (days)

30

25

20

15

10 ar nu a J

y F

ar ru b e

y

ch ar M

r il Ap

M

ay

n Ju

e

ly Ju

r r r r st be be be be gu m m to m c e e e Au c v O pt De No Se

Month

Figure 7.8: Box plot of the number of days that had adequate surface light, grouped by month (Nov 1999 to Oct 2005) Beerburrum were used. These readings indicate the level of cloud cover, measured in octets. Therefore the maximum reading for any day with readings at 9am, noon and 3pm is 24. The cloud cover readings were translated into Adequate and Inadequate, the two expert-defined states of the Surface Light node, by comparing the daily average to 14. At the next BN review this approximation of the node and the cut-off can be further debated and changed if required. The resulting CPT table is shown in Table 7.6 and a box plot of the number of days per month with adequate surface light based on the Beerburrum data from November 1999 to October 2005 is shown in Figure 7.8. 140

7.3

Lyngbya Bloom Initiation

The seven monthly BNs were created using the generic OOBN which represents the reviewed Lyngbya Science BN illustrated in Figure 7.2. The Lyngbya Bloom Initiation OOBN for December is shown in Figure 7.9.

P(Bloom) %

Figure 7.9: Lyngbya Science OOBN for December (Hugin ®) The six input nodes for Rain - present, No of prev dry days, Wind Direction, Wind Speed, Surface Light and Temperature can be seen within the rectangular shaped box with rounded edges. This box is the generic Lyngbya Science OOBN. The temporal nodes for December were populated with the CPTs for December (Tables 7.1 to 7.6) and then connected to the input nodes of the generic OOBN. Therefore 29 the specific 28 information for 27 December will then 26 flow through the 25 generic Lyngbya 24 Science BN taking 23 into account all the 22 interactions modelled in this 21 network to give the 20 probability of a Sep Oct Nov Dec Jan Feb Mar Month Lyngbya bloom initiation for Figure 7. 10: Line graph of the probability of Lyngbya bloom December. From initiation from the temporal BNs the static Lyngbya Science BN the probability of a Lyngbya Bloom initiation is 25.3% and for December this increases to 26.8%. The other monthly OOBNs of interest were compiled and run in the Hugin® package and the resulting probabilities are shown in Figure 7.10. We can see that the probability of Lyngbya bloom initiation was less than 25.3% for the 141

months of September to November, but a bloom initiation was more probable for the months of December to March. This graph clearly demonstrates an increase in Lyngbya bloom initiation in the summer months with the rate of increase most notable from October to November and November to December and slowing down but still increasing from December to January. Thereafter the probability, although still elevated, flattens out. The results presented here are generally in keeping with experts’ expectations for Lyngbya bloom initiation behaviour; although the elevated probabilities in February and March are perhaps somewhat surprising in that the drop in the bloom initiation probability from January to February and March is not more rapid.

7.4

Discussion and Conclusions

Substantial amounts of expert opinion can be used to inform a static BN. Typically this includes: helping define BN model structure; populating CPTs in the absence of data; and specifying thresholds of nodes populated either by empirical data or expert opinion. We demonstrated here the use of object oriented (OO) Bayesian networks to investigate the temporal behaviour of Lyngbya bloom initiation and that expert opinion elicited for a static BN can be reused and extended for a dynamic BN. Two aspects of OO modelling used in this study were reuse and subclassing. We were able to transform the static Lyngbya Science BN very simply into an OOBN so that it could be reused for the monthly time slices. Furthermore by using the specific data for each month of interest we ‘subclassed’ the OOBN to take on the specific values for that month (Koller and Pfeffer, 1997) while retaining the elicited values for all the non-temporal nodes. Expert opinion was necessary to adjust the thresholds for Rainfall, No of prev dry days and Temperature to suit a monthly rather than an annual time scale. Additional information for the Wind Direction, Wind Speed and Surface Light nodes was elicited for the dynamic OOBN. This included thresholds for the nodes and interpreting and summarising the available daily readings (including missing data) to populate the CPTs. Hodgkinson and Cox (2005) observed that Lyngbya blooms, although occurring in both dry and wet months, only occurred in a dry month if it was preceded by a wet period or if there was already a bloom present. They also noted that the majority of blooms occurred after a dry period which was followed by wet periods sometimes with a lag effect. The next stage of modelling Lyngbya bloom initiation in a more dynamic way would be to include these lag effects so that the influence of one month flows through to the next. This connectivity between sub-networks allowing information flow between them is an important characteristic of OOBN modelling and one which ought to be elicited from the experts so that this behaviour can be represented in the seven time slices. Moreover we recommend that the time slices are considered in more detail by the science experts with the purpose of expanding temporal behaviour for a 142

particular month if deemed necessary. In other words, instead of only populating the input nodes of the generic network with data specific to that time slice, additional nodes and interactions could be added which are specific only to that time slice. This is a typical subclassing activity in OOBN modelling (Koller and Pfeffer, 1997). The inclusion of temporal aspects of Lyngbya bloom initiation in this study has provided an improved representation of the risk of algal blooms in the months of interest to stakeholders. However utilising more of the characteristics of OOBN modelling as outlined here would enable Lyngbya bloom initiation to be modelled more precisely. Moreover this study illustrates the ease with which expert knowledge can be transferred from a single static model to provide simple dynamic models of ecological processes, such as Lyngbya bloom initiation in Moreton Bay. However we identify the need for further elicitation or ‘sub-classing’ to target the temporal-dependent behaviour of the blooms to provide more precise dynamic models. ACKNOWLEDGEMENTS Financial assistance was provided by the Environmental Protection Agency and Australian Government through the South East Queensland Healthy Waterways Partnership, the ARC Centre for Dynamic Systems and Control, and QUT Institute for Sustainable Resources. We fully acknowledge the contributions of the Lyngbya Science Working Group and the BN Review team. We wish to thank three anonymous reviewers for their constructive comments. REFERENCES Borsuk, M.E., Reichert, P., Peter, A., Schager, E., and Burkhardt-Holm, P. (2006), Assessing the decline of brown trout (Salmo trutta) in Swiss rivers using a Bayesian probability network. Ecological Modelling, 192, 224-44. Hamilton, G., McVinish, R., and Mengersen, K. (2007a), Bayesian model identification and averaging for coastal algal bloom prediction. (unpublished). Hamilton, G.S., Fielding, F., Chiffings, A.W., Hart, B.T., Johnstone, R.W., and Mengersen, K. (2007b), Investigating the Use of a Bayesian Network to Model the Risk of Lyngbya majuscula Bloom Initiation in Deception Bay, Queensland. Human and Ecological Risk Assessment, 13, 12711279. Hodgkinson, J. and Cox, M. (2005), Lyngbya blooms in relation to temperature, rainfall, SOI and tides: 2000 to 2005. Moretonbay Waterways and Catchment Partnerships. Koller, D. and Pfeffer, A. (1997), Object-Oriented Bayesian Networks. Thirteenth Annual Conference on Uncertainty in Artificial Intelligence (UAI-97), Providence, Rhode Island, 1-3 August 1997, 302-313.

143

Watkinson, A.J., O'Neil, J.M., and Dennison, W.C. (2005), Ecophysiology of the marine cyanobacterium, Lyngbya majuscula (Oscillatoriaceae) in Moreton Bay, Australia. Harmful Algae, 4, 697-715.

144

Chapter 8 Discussion As stated in the title, the methodological aim of this thesis is the development of a Bayesian Network framework which facilitates both the creation of integrated statistical models within a multi-field and multi-expert context, as well as the integration of existing disparate statistical models. We also focus on the following ecological objectives in the applied aims of this thesis:



Obtain a more comprehensive understanding of Lyngbya majuscula bloom initiation



Improve the success rate of relocations of captured wild cheetahs in South Africa and Botswana



Ensure the viability of free-roaming cheetah populations in Namibia

The thesis therefore has a dual focus on methodology and application. The methodological focus centres on the design and integration of BNs. The applied ecological focus is on integrating multiple models of Lyngbya majuscula and designing and validating BN models for cheetah relocation and wild cheetah population viability. In tackling these problems we utilise BNs to model and integrate the current information about these issues. Moreover we do this within an OO framework to which BNs are ideally suited. In chapters 3, 4 and 7 our attention was on improving our understanding of Lyngbya by integrating various models of the algal bloom, the factors leading to bloom initiation and exploring its temporal nature. The Lyngbya Research and Management project initiated by SEQ Healthy Waterways Partnership motivated the creation of several models to better understand the dynamics of Lyngbya bloom initiation and behaviour. In chapter 3 we integrated a Science BN and a management network via a catchment model which modelled the flow of nutrient loads through the catchment and into Deception Bay. The integration was through scenario testing of the Science BN, a typical activity performed as part of BN model validation and evaluation. The scenario testing confirmed the temporal nature of Lyngbya bloom initiation. We then applied OOBN modelling techniques to represent and explore the dynamic nature of bloom initiation. The methodological contribution here was the process of 145

evolving a static BN into a dynamic BN through an OO framework to create an integrated set of OOBN models which inform each other and the outcome of interest. In chapter 4 the Lyngbya models which were integrated were the Science BN, the Management network and a GIS-based Hazard Map. The motivation for this integration was the potential use of the combined model as a management tool to assess the potential impact on Lyngbya bloom initiation as a result of a proposed development or change in land use in the catchment area. This was the applied contribution of this chapter. The methodological contribution was the identification of the point of integration, and the flexible integration procedure through the BN framework. In chapter 5 our ecological focus shifted to the conservation of cheetahs and then in chapter 7 we returned to look at Lyngbya with the benefit of lessons learnt in chapters 5 and 6. Our applied contribution in chapters 5 and 6 was not only the comprehensive representation of current knowledge on cheetah relocation and wild cheetah population viability, but the stated intention by the cheetah experts to utilise these BNs in the national management strategy workshops in South Africa and Namibia, where the future direction of cheetah conservation management in those countries will be discussed and ratified. The methodological contribution in chapter 5 formed the pivotal contribution of these last three chapters with chapters 6 and 7 enhancing and applying this BN design approach. The approach is the new heuristic, Iterative Bayesian Network Development Cycle (IBNDC), which is defined and explained in chapter 5. The IBNDC guides BN model design within an object oriented framework and is therefore conducive to the creation of integrated and dynamic networks and the refinement, reuse and redesign of existing static BNs. The specific applied contribution in chapter 5 is the design, quantification and validation of four BNs for cheetah relocation success in South Africa and Botswana. The BN representing relocations into fenced protected areas in South Africa will be demonstrated to stakeholders at the South African National Conservation Action Planning Workshop for cheetahs on 18 and 19 June 2009. The applied contribution in chapter 6 is a BN model of wild cheetah population viability in Namibia. This was the subject of a BN modelling workshop in Namibia attended by local and international cheetah experts and a government representative from the ministry of environment and tourism. The stated intension of the cheetah expert team is to use this BN as the basis for the Namibian National Cheetah Planning workshop planned in Namibia for 2009. The methodological contribution here was the enhancement of the IBNDC approach, which guided the development of this BN, to identify the integration points between three subnetworks. This facilitated the parallel development of three subnetworks for subsequent integration into an overall BN. Having applied the IBNDC to create a new integrated BN in chapter 6, we reviewed an existing BN in chapter 7. The review of a BN is the precursor to the phases in the Iterative Process of the IBNDC, which is critical to ensure that the BN is continually revised and updated in light of new information and 146

research. The applied contribution in this chapter is the set of monthly BNs which model Lyngbya bloom initiation at a finer scale for the months of interest to the expert team. The methodological contribution was the combination of two object oriented methodology concepts of ‘subclassing’ and ‘reuse’ with the IBNDC approach to redesign, quantify and validate the time slices. With minimal changes, the static Lyngbya Science BN was changed into an OOBN and subclassed for each month while retaining the elicited values and thresholds for all the non-temporal nodes (reuse) and modifying expert opinion for the time slices. Chapter 7 highlights the importance that expert elicitation plays in defining the BN structure, quantifying, testing and evaluating it. The scope of this thesis is centred on optimising and facilitating the design of BNs to present a comprehensive view of the current knowledge and research of the ecological issue of interest, and to create BNs in such a way that they enable integration with other networks. This was achieved by developing an approach to BN modelling within an OO framework which enables system integration, and the extension and redesign of existing models. The applied research has already generated interest from other organisations dealing with the conservation of endangered species such as the African wild dog (Lycaon Pictus) in southern Africa and the red wolf (Canis rufus) in North America. However the relevance of this work is not limited to ecological issues such as species survival and toxic algal blooms, but can readily be taken up in other fields such as epidemiology, biosecurity, bio-surveillance and airport security. Interest in applying the research from this thesis to the Western Australian Premier’s Water Foundation project has also been expressed. It is hoped that this research will generate further interest and facilitate greater understanding of complex problems leading to improved scientific knowledge and management practice in ecology and other inter-disciplinary fields. This therefore motivates areas of future research, four of which are briefly discussed below: The first is to create generic BNs which include context specific contributions. The rationale here is that the IBNDC has been useful to construct BN models in an OO frame-work, facilitating network integration and reuse. A novel application of this heuristic would then be to focus on modelling the network and subnetworks in such a way that the points of integration contain a generic and a domain specific component. If experts agree on the contribution of these components relative to each other, we will then have a BN which can more easily be reused and adapted to other similar ecological issues. For example, a generic model on human predator conflict and predator relocation can be designed for cheetahs in such a way that it is applicable to many other endangered predators which have similar conservation issues as cheetahs. The OO network fragments can then be instantiated for the different predator. Two species for which this may be suited are the African wild dog (Lycaon Pictus) in southern Africa and the red wolf (Canis rufus) in North America. Both species are on the IUCN red list of endangered species. They suffer from a negative image and their principal threat to survival is human conflict 147

(IUCN 2009). In a similar vein Lyngbya BN models can be refactored to include a generic and a country, or catchment specific component which can then be reused and domain specific information added. The second opportunity for future research is to compare the BN approaches used in this thesis to other statistical approaches. Depending on the results of this research, it would be interesting and beneficial to the modelling community to investigate the amalgamation of techniques. The third area is the application of the OO paradigm to the creation of other integrated statistical models. The design and construction of BNs, DBNs, OOBNs and DOOBNs are well suited to OO modelling techniques as demonstrated by the development of the IBNDC heuristic and the ecological case studies considered in this thesis. However other statistical models may not be able to utilise these OO concepts as readily as BNs. Nonetheless the benefits of inter-connected models for an ecological system or issue of concern are clearly evident and worth pursuing. The fourth is Expert elicitation in Bayesian networks. Expert elicitation plays a vital role in BN construction, quantification and validation. However this is an expensive way to construct BNs and ways in which we can maximise the use of their time and expertise and reuse previously elicited information would be a valuable contribution to BN modelling.

148

Integrated Bayesian Network frameworks for modelling ... - QUT ePrints

Integrated Bayesian Network frameworks for modelling ... - QUT ePrints

Suggest Documents

A Bayesian Network approach to modelling temporal ... - QUT ePrints

Integrated Modelling Frameworks for Environmental Assessment and ...

A beginners guide to Bayesian network modelling for integrated ...

Social network markets - QUT ePrints

Social network markets - QUT ePrints

Seventh-Graders' Mathematical Modelling on ... - QUT ePrints

Integrated genomic characterization of endometrial ... - QUT ePrints

Bayesian latent variable models for biostatistical ... - QUT ePrints

QUT Digital - QUT ePrints

Social network markets - QUT ePrints [PDF]

THE YOUTH INTERNET RADIO NETWORK - QUT ePrints

Modelling and Analysis of Reliability and Costs for ... - QUT ePrints

Individual User Behaviour Modelling for Effective Web ... - QUT ePrints

A conceptual data modelling methodology for asset ... - QUT ePrints

Biotechnology for Biofuels - QUT ePrints

Instructions for Authors - QUT ePrints

Foundations for Giving - QUT ePrints

Planning for Heat - QUT ePrints

Foundations for Giving - QUT ePrints

Instructions for Authors - QUT ePrints

AVON ÅTÄKARO NETWORK Integrated assessment frameworks for ...

BAYESIAN NETWORK MODELLING THE RISK ANALYSIS OF

Multifaceted Modelling of Complex Business Enterprises - QUT ePrints

Modelling and Supporting Processes in Creative ... - QUT ePrints

Integrated Bayesian Network frameworks for modelling ... - QUT ePrints