Domain Specific Language for Spatial Simulation ...

UNIVERSIDADE DE LISBOA ´ INSTITUTO SUPERIOR TECNICO

DSL3S: Domain Specific Language for Spatial Simulation Scenarios Lu´ıs Alexandre Duque Moreira de Sousa Supervisor: Doctor Alberto Manuel Rodrigues da Silva

Thesis approved in public session to obtain the PhD Degree in Information Systems and Computers Engineering Jury final classification: Pass with Distinction.

Jury Chairperson: Chairman of the IST Scientific Board Members of the Committe: Doctor Alberto Manuel Rodrigues da Silva ˜ Carlos Pascoal de Faria Doctor Joao Doctor Pedro da Costa Brito Cabral Doctor Maria Armanda Simenta Rodrigues Grueau Doctor Alexandre Bacelar Gonçalves Doctor Bruno Emanuel da Graça Martins

2016

UNIVERSIDADE DE LISBOA ´ INSTITUTO SUPERIOR TECNICO DSL3S: Domain Specific Language for Spatial Simulation Scenarios Lu´ıs Alexandre Duque Moreira de Sousa

Supervisor: Doctor Alberto Manuel Rodrigues da Silva

Thesis approved in public session to obtain the PhD Degree in Information Systems and Computers Engineering Jury final classification: Pass with Distinction.

Jury Chairperson: Chairman of the IST Scientific Board Members of the Committe: Doctor Alberto Manuel Rodrigues da Silva, Associate Professor at Instituto Superior ´ Tecnico of Universidade de Lisboa

˜ Carlos Pascoal de Faria, Assistant Professor at Faculdade de Engenharia of Doctor Joao Universidade do Porto

Doctor Pedro da Costa Brito Cabral, Assistant Professor at Nova Information Management School of Universidade Nova de Lisboa

Doctor Maria Armanda Simenta Rodrigues Grueau, Assistant Professor at Faculdade de ˆ Ciencias e Tecnologia of Universidade Nova de Lisboa ´ Doctor Alexandre Bacelar Gonçalves, Assistant Professor at Instituto Superior Tecnico of Universidade de Lisboa ´ Doctor Bruno Emanuel da Graça Martins, Assistant Professor at Instituto Superior Tecnico of Universidade de Lisboa

2016

Oh, I’d search out every knowledge that I could find Unravel all the mysteries of mind If I only had time Peter Hammill

Acknowledgments I would like to start by thanking my supervisor, Prof. Alberto Rodrigues da Silva, for enduring with me in this long journey. His steering in the early stages of this thesis was capital in achieving its goals. His advice throughout the various phases of this work was always sharp and objective but never imposing. His method lasts as a lesson in research supervision. I thank Ricardo Sousa and Dr. Fernanda Nery, colleagues and friends, with whom many discussions led to the concept behind this work. Their encouragement was vital in embracing the subject as a PhD thesis. I thank Prof. Sean Luke and Dr. Mark Colleti for their support with the MASON simulation tool-kit. This acknowledgement extends to all active members in the MASON community. I am also obliged to the Eclipse community, in particular those participating in the Papyrus and Acceleo fora, far too many to reference here. Their help in clarifying issues and dealing with bugs was absolutely essential. ˆ Otjacques and Dr. Hichem Omrani for facilitating the test sessions with potential I thank Dr. Benoit users. I also thank all the participants in these sessions, their feedback was precious. I am grateful to my colleagues in the MUSIC project, Ulrich Leopold, Christian Braun, Christopher Eykamp, Rui Martins, Olivier Baume, Alessio Mastrucci and Matteo De Stefano. They made the role of worker-student considerably easier. Many friends had an important role during these years, perhaps even without knowing, with encouraging words or uncommitted comprehension. In no particular order I would like to name Marisa, Nate, Lu´ıs Miguel, Euan, Gisela, Rui, Iain, Ana, Viola, Diogo, Mario, Gilles, Diana, Carole, Norry, Daisy. Special acknowledgements to my closest family, Odete and Sara; I can only reciprocate their ´ Celeste, Guida, David and Nuno, to whom I am forever love. And to my family of recent years, Toni, indebted. Finally, I thank my father. After all these years his advice and teachings still echo in me.

´ ˜ d’Egua, Chas 28th of August, 2015 Lu´ıs Alexandre Duque Moreira de Sousa

iii

Abstract Techniques such as cellular automata and agent-based models have long been used in the field of Geographic Information Systems (GIS) to capture and simulate the dynamics of change of spatial information. However, spatial simulation has been largely absent from traditional GIS software packages, unlike most other spatial analysis techniques. The relative complexity of spatial simulation has resulted in a myriad of independent tools, each with different features, in many cases focusing on highly specific contexts. Contrary to other disciplines (e.g. systems engineering) a simulation tool for GIS, with a wider variety of application domains, but accessible to non-programmers seems largely lacking. Between code libraries and pre-compiled models, the spatial analyst faces a non trivial choice among a vast number of spatial simulation tools available today. Early on a compromise for wider application range with a code library or ease of use with a pre-compiled model must be made, with relevant consequences throughout the analysis process. Various domain specific languages (DSLs) have been attempted to fill in this gap between code libraries and pre-compiled models (e.g. SELES, NetLogo, Ocelet). However, these have invariably resulted in new textual programming languages, still requiring some level of programming skills without raising the level of development abstraction. Moreover, portability and GIS interoperability are often issues with these languages. This dissertation proposes a Model Driven Development (MDD) approach to spatial simulation in the field of GIS. This approach includes a DSL and a companion framework/tool. The language is DSL3S, the acronym for “Domain Specific Language for Spatial Simulation Scenarios”, that captures relevant simulation concepts in a graphical language. DSL3S is formalised as a UML profile, that allows the design of simulation models through the arrangement of graphical elements. Furthermore, a prototype implementation of this language was developed, relying on the MDD tools distributed with the Eclipse IDE. This prototype is named MDD3S, the acronym for “Model-Driven Development for Spatial Simulation Scenarios”. It includes a model-to-code transformation infrastructure, that produces ready to run simulations from DSL3S models, supported by the MASON simulation tool-kit. These tools are packaged as plug-ins that may be seamlessly added to Eclipse. In addition, various assets were produced facilitating the usage of the language and the prototype: a web-based manual, a series of tutorial videos and a collection of simple models illustrating the usage of DSL3S in classical applications such as wildfires or urban sprawl. An evaluation programme was conducted during which experts in GIS and related fields were subject to a first exposure to DSL3S and the accompanying MDD3S framework, and then required to

v

answer a questionnaire. This evaluation points to a good degree of ease of use, with the first quartile in positive territory for all questions related to the language and the framework. The results are less positive for the MDD approach in general, but still point to some adoption potential.

Keywords Spatial Simulation; Domain Specific Language; Model-Driven Development; UML Profile; Modelto-Code Transformation.

vi

Resumo ´ ´ ˆ sido usadas Tecnicas como automatos celulares e modelos basedos em agentes ha´ muito tem ˜ Geografica ´ ˆ no dom´ınio dos Sistemas de Informaçao (SIG) para capturar e simular as dinamicas de ˜ espacial. No entanto, ao contrario ´ ´ ´ mudança da informaçao de outras tecnicas de analise espacial, a ˜ espacial tem-se mantido largamente ausente dos programas SIG tradicionais. A relativa simulaçao ˜ espacial tem resultado numa mir´ıade the ferramentas independentes, complexidade da simulaçao cada qual com diferentes funcionalidades, em muitos casos focadas em contextos muito espec´ıficos. ˜ para Contrariamente a outras disciplinas (e.g. engenharia de sistemas) uma ferramenta de simulaçao ˆ ˜ ˜ existir. SIG de ambito aplicacional vasto e acess´ıvel a nao-programadores parece nao ´ ´ ˜ trivial De bibliotecas de codigo a modelos pre-compilados, o analista enfrenta uma escolha nao ˜ espacial dispon´ıveis actualmente. Impoe-se ˜ entre um vasto numero de ferramentas de simulaçao ´ ´ desde logo um compromisso por maior alcance applicacional com uma biblioteca de codigo ou pela ˜ com um modelo pre-compilado, ´ ˆ facilidade de utilizaçao com consequencias relevantes ao longo do ´ ´ ˆ sido ensiadas para colmatar processo de analise. Varias linguages espec´ıficas do dom´ınio tem ˆ invariavelmente ´ este fosso (e.g. SELES, NetLogo, Ocelet). No entanto, estas tem resultando em ˜ sem novas linguages textuais, requerendo ainda algum n´ıvel de conhecimentos em programaçao, ˜ do desenvolvimento. Ademais, portabilidade e interoperabilidade incrementar o n´ıvel de abstracçao ˜ dificuldades recorrentes com estas linguagens. com os SIG sao ˜ propoe ˜ a aplicaçao ˜ da metodologia Desenvolvimento Conduzido por Modelos Esta dissertaçao ˜ espacial na area ´ (DCM) a` simulaçao dos SIG. Tal inclui uma linguagem espec´ıfica do dom´ınio e ´ uma ferramenta de desenvolvimento. A linguagem tem por nome DSL3S, acronimo para “Domain ˜ Specific Language for Spatial Simulation Scenarios”, que captura conceitos relevantes de simulaçao ´ numa linguagem grafica. A DSL3S e´ formalizada como um perfil UML, que permite o desenvolvi˜ espacial atraves ´ da composiçao ˜ de elementos graficos. ´ mento de modelos de simulaçao Out˜ prototipo ´ ` ferramentas rossim, uma implemetaçao desta liguagem foi desenvolvida, recorrendo as ´ DCM distribu´ıdas com o ambiente de desenvolvimento Eclipse. Este prototipo toma o nome MDD3S, ´ acronimo para “Model-Driven Development for Spatial Simulation Scenarios”. Inclui uma infrastruc˜ modelo-para-codigo, ´ ˜ prontas a correr a partir de modtura de transformaçao que produz simulaçoes ˜ MASON. Estas ferramentas estao ˜ dispon´ıveis elos DSL3S, supportados pela biblioteca de simulaçao como plug-ins para o Eclipse. ´ ˜ da linguagem e Foram ainda criados varios materiais de apoio por forma a facilitar a utilizaçao ´ ´ ´ ˜ de modelos do prototipo: um manual electronio, um conjunto de v´ıdeos introdutorios e uma colecçao

vii

˜ da DSL3S em aplicaçoes ˜ ´ simples ilustrando a utilizaçao classicas como os fogos florestais ou o crescimento urbano. ˜ a` linguagem foi conduzido, expondo peritos em SIG e areas ´ Um programa de avaliaçao relationados a um primeiro contacto com a DSL3S e as ferramentas desenvolvidas; seguidamente respon´ ˜ aponta para um bom grau de facilidade de utilizaçao, ˜ com deram a um quesionario. Esta avaliaçao ˜ em valores positivos para todas as questoes ˜ o primeiro quartil da distribuiçao relacionadas com a ´ ˜ menos positivos para a metodologia DCM, mas apontam linguagem e o prototipo. Os resultados sao ˜ ainda assim para algum potencial de adopçao.

Palavras Chave ˜ Espacial; Linguagem Espec´ıfica do Dom´ınio; Desenvolvimento Conduzido por ModeSimulaçao ˜ Modelo-para-Codigo. ´ los; Perfil UML; Transformaçao

viii

Contents 1 Introduction

1

1.1 Context . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

2

1.2 History . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

3

1.3 Tool Support Spectrum

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

5

1.3.1 Program-level Tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

6

1.3.2 Model-level Tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

7

1.3.3 Domain Specific Languages

. . . . . . . . . . . . . . . . . . . . . . . . . . . . .

7

1.4 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

8

1.5 Thesis Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

9

1.6 Research Goals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

9

1.7 Research Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 1.8 Original Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 1.8.1 Conference communications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 1.8.2 Journal Articles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 1.9 Thesis Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 2 RelatedWork

15

2.1 Concepts of Spatial Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 2.2 Concepts of Cellular Automata and Agents . . . . . . . . . . . . . . . . . . . . . . . . . 17 2.2.1 Cellular Automata . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 2.2.2 Agents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 2.3 Previous DSLs for Spatial Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 2.3.1 NetLogo . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 2.3.2 SELES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 2.3.3 MOBIDYC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 2.3.4 Ocelet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 2.3.5 GAML . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 2.3.6 AML . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 2.3.7 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 2.4 Model-Driven Development . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

ix

3 Proposed Approach

29

3.1 The Spatial Simulation Analysis Process . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 3.1.1 Roles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 3.1.2 Tasks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 3.1.3 Artefacts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 3.2 Spatial Simulation with DSL3S . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 3.2.1 DSL3S Language and Framework . . . . . . . . . . . . . . . . . . . . . . . . . . 34 3.2.2 The Spatial Analysis Process with DSL3S . . . . . . . . . . . . . . . . . . . . . . 35 4 The DSL3S Language

39

4.1 Abstract Syntax . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 4.2 Concrete Syntax . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 4.3 Structural Semantics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 4.4 Model Organisation - Views . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 4.5 Proposed Icons . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 5 The DSL3S Framework

47

5.1 MDD3S - Prototype Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 5.1.1 Papyrus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 5.1.2 Acceleo . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 5.1.3 MASON . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 5.1.4 GeoMASON . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 5.2 Resulting Source Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 5.3 Support Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 6 Evaluation

59

6.1 Use Cases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60 6.1.1 Simulation Model A – Predator-Prey . . . . . . . . . . . . . . . . . . . . . . . . . 60 6.1.2 Simulation Model B – Forest Fire . . . . . . . . . . . . . . . . . . . . . . . . . . . 63 6.1.3 Simulation Model C – Urban Sprawl . . . . . . . . . . . . . . . . . . . . . . . . . 65 6.1.4 Simulation Model D – Game of Life . . . . . . . . . . . . . . . . . . . . . . . . . . 67 6.2 User Experience . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 6.2.1 Sample . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 6.2.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72 6.2.2.A Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72 6.2.2.B Ratings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72 6.2.3 Interpretation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77 6.3 Comparison With Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80 6.3.1 Language-level Comparison

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80

6.3.2 Tool Support Comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83

x

6.3.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86 7 Conclusion

87

7.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88 7.2 Thesis Goals Revisited . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89 7.3 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90 Bibliography

93

Appendix A Stereotypes

A-1

Appendix B Evaluation Session Guide

B-1

Appendix C Evaluation Questionnaire

C-1

xi

List of Figures 1.1 Detail of John von Neumann’s self replicating automaton; courtesy of Will Stevens. . . .

4

1.2 The spatial simulation tools spectrum (adapted from Fall and Fall [Fall and Fall, 2001] ).

6

1.3 The prototype development cycle employed in this work. . . . . . . . . . . . . . . . . . . 11 2.1 Three neighbourhood types, from left to right: von Neumann, Moore, and Extended Moore. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 2.2 A Wildfire simulation with cellular automata (from Li and Magil [Li and Magill, 2001]). . . 19 2.3 A disaster simulation in a city. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 2.4 Hierarchy of basic types in GAML. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 2.5 Traditional software engineering process compared to the MDD methology. . . . . . . . 27 3.1 The spatial simulation analysis process. . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 3.2 The simplified spatial simulation analysis process using DSL3S. . . . . . . . . . . . . . 36 4.1 DSL3S meta-model. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 4.2 DSL3S model views. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 4.3 The icons proposed for the DSL3S stereotypes. . . . . . . . . . . . . . . . . . . . . . . . 46 5.1 The technologies used to implement MDD3S. . . . . . . . . . . . . . . . . . . . . . . . . 48 5.2 Main views of the Papyrus perspective in Eclipse: Model Explorer, Model Canvas and UML Palette. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 5.3 Various spatial datasets and their minimum bounding rectangle.

. . . . . . . . . . . . . 54

5.4 Domain model of the code produced by MDD3S. . . . . . . . . . . . . . . . . . . . . . . 55 6.1 Predator-Prey model in DSL3S; Simulation and Scenario Views. . . . . . . . . . . . . . 61 6.2 Predator-Prey model in DSL3S; Prey View. . . . . . . . . . . . . . . . . . . . . . . . . . 61 6.3 Predator-Prey model in DSL3S; Predator and Interaction Views. . . . . . . . . . . . . . 62 6.4 A sample run of the Predator Prey DSL3S simulation. . . . . . . . . . . . . . . . . . . . 62 6.5 Forest fire model in DSL3S. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64 6.6 A sample run of the Fire DSL3S simulation. . . . . . . . . . . . . . . . . . . . . . . . . . 64 6.7 Urban Sprawl model in DSL3S. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66 6.8 A sample run of the Urban Sprawl DSL3S simulation. . . . . . . . . . . . . . . . . . . . 67 6.9 The Game of Life model in DSL3S. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68

xiii

6.10 A sample run of the Life DSL3S simulation, with living cells portrayed in white.

. . . . . 68

6.11 General profile of the sample. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70 6.12 Experience profile of the sample. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71 6.13 Distribution of blocking issues. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 6.14 Exercise completion distribution. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74 6.15 Box plot diagrams for the language evaluation. . . . . . . . . . . . . . . . . . . . . . . . 76 6.16 Box plot diagrams for the tool usability evaluation.

. . . . . . . . . . . . . . . . . . . . . 77

6.17 Box plot diagrams for the general approach evaluation. . . . . . . . . . . . . . . . . . . . 79

xiv

List of Tables 4.1 Valid relationships in DSL3S with respective cardinalities. . . . . . . . . . . . . . . . . . 44 5.1 The Java service hasLinkedStereotype used in MDD3S to determine if a model element is linked to elements of a specific type. . . . . . . . . . . . . . . . . . . . . . . . . . 50 5.2 The MDD3S template for the Perish stereotype. . . . . . . . . . . . . . . . . . . . . . . 51 6.1 Summary statistics for the language evaluation. . . . . . . . . . . . . . . . . . . . . . . . 75 6.2 Summary statistics for the tool usability evaluation. . . . . . . . . . . . . . . . . . . . . . 75 6.3 Summary statistics for the general approach evaluation. . . . . . . . . . . . . . . . . . . 78 6.4 Language-level comparison of several simulation DSLs with DSL3S. . . . . . . . . . . . 82 6.5 Tool support comparison of several simulation DSLs with DSL3S. . . . . . . . . . . . . . 85

xv

Abbreviations AML Agent Modelling Language DSL Domain Specific Language DSL3S Domain Specific Language for Spatial Simulation Scenarios EBNF Extended Backus-Naur Form EMF Eclipse Modelling Framework GAML GAMA Modelling Language GIS Geographic Information Systems GML Geography Markup Language MDA Model-Driven Architecture MDE Model-Driven Engineering MDD Model-Driven Development MDD3S Model-Driven Development for Spatial Simulation Scenarios MOBIDYC Modelling Based on Individuals for the Dynamics of Communities MOFM2T Meta-Object Facility Model to Text Transformation OCL Object Constraint Language OGC Open Geospatial Consortium OMG Object Management Group OO Object-Oriented SELES Spatially Explicit Landscape Event Simulator UML Unified Modelling Language

xvii

1 Introduction

Contents 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9

Context . . . . . . . . . History . . . . . . . . . Tool Support Spectrum Problems . . . . . . . . Thesis Statement . . . Research Goals . . . . Research Methodology Original Contributions Thesis Outline . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

2 3 5 8 9 9 10 12 14

1

1.1

Context

The data stored in an information system usually portraits the world as it was at a specific point or interval in time. This is especially true for spatial data but in such case with the added certainty that they will also evolve. The patterns of land cover and land use, of social, economic, and demographic variables in general, constantly change with time. Objectively, all spatial data are only valid within a specific time frame, just as if any cartographic composition was a still picture taken to the spatial elements represented. In order to deal with this reality, entire organisations exist with the sole purpose of collecting and updating spatial data, through field campaigns with on site visits, by air borne or space borne data acquisition [Kraak and Ormeling, 2009]. Nonetheless, regular data collection provides at best a periodic picture of the changing reality, which for some applications might not be enough [Batty, 2007]. Stakeholders of an information system may need not only to know how the data changed in the past, but in order to plan ahead or otherwise reason upon the data, they may also need to understand why they changed the way they did and how they might continue to evolve in the future. This need is met recurring to a particular methodology in the spatial analysis discipline known as Spatial Simulation.

It involves in first place a modelling phase in which the fundamental

drivers of change – the spatial dynamics – are captured into mathematical, logic or functional constructs. The resulting models are then applied to sets of input data during a certain period of time [de Smith et al., 2015]. These heuristic or conceptual models are refined through the application to periods of time for which it is known how the data evolved, thus allowing for validation and/or calibration. When models reach a satisfactory level of success against known data they can then be applied to periods of time for which knowledge is scarce (usually the future). This produces new sets of data, pictures of time epochs missing from the base data [Law, 2007]. Cellular Automata [Wuensche and Lesser, 1992] is the oldest technique used in spatial simulation, in which the world is discretised in a grid of regular cells evolving in accordance to a fixed set of rules. More recently, Agent-based Modelling has grown into a popular paradigm, with wide application in the GIS context [Batty, 2007]. An agent can be defined as an autonomous object that perceives and reacts to its environment, a concept that stems from Object-Oriented (OO) programming [Ferber, 1999] [Weiss, 1999]. Agent-based modelling and cellular automata are two techniques that superimpose to some extent in the GIS context, though the former brought new processing possibilities, with geographic entities not only reacting to stimuli but also storing knowledge and reasoning before acting. Agents can also be used to model phenomena that do not have direct geographic meaning, such as social or economic interactions [Epstein and Axtell, 1996]. After well over two decades of development and practical application, the usage of spatial simulation remains nevertheless relatively restricted. Most GIS software packages do not provide direct support and in certain domains modelling requires solid programming skills. Typical GIS users and spatial analysts still face relevant challenges to employ these techniques. These challenges are the primary motivation for this work. Furthermore, a fresh approach to the subject may also foster

2

improvements in the way spatial simulation models are described and communicated to peers and stakeholders. This chapter provides an introduction to this thesis, its motivation and goals. It starts in Section 1.2 with general historical elements that shaped the techniques employed in spatial simulation. Section 1.3 then lays out the types of tools available and their shortcomings. Section 1.4 gathers the difficulties identified in a series of research questions that lead to the Thesis Statement in Section 1.5 and the research goals in Section 1.6. Section 1.7 describes the research methodology employed and Section 1.8 lists the scientific contributions resulting from this thesis. Section 1.9 closes this chapter outlining the remainder of this document.

1.2

History

Throughout History the concept of automaton can be found embodied in many forms, in general, machines (or ideas of machines) capable of substituting Man at some particular complex task. But it was only with the Industrial Revolution that complex machinery appeared, pushing the concept of automaton close to what it is today. According to McIntosh [1990] the experiments of Charles Babbage [Swade, 1998] in particular can be considered the seed that lead to modern day digital automata. During the II World War various concepts akin to Babbage’s machines were realised using electronic components. At that time, Warren McCulloch and Walter Pitts [1943] put the concept of automaton in a different perspective, producing a model mimicking the connectivity of the neural system and the signals propagating through it. After the War this work inspired John von Neumann while he tried to synthesise a self replicating automaton. Von Neumann would eventually follow the suggestion of mathematician and friend Stanislaw Ulam for a lattice-based model. He would accomplish his goal in a grid of automaton cells, drawn with pencil on squared paper, and using different graphical symbols for each cell state [von Neumann and Burks, 1966] (a detail of this automaton is reproduced in Figure 1.1). Von Neumann’s was the first cellular automaton in History, and markedly contrasted with previous automata realisations. While former concepts concentrated on inputs and outputs — the processing of signals — von Neumann’s automaton cells acted in function of their Neighbourhood. Moreover, van Neumann’s design freed automata into an infinite world, the backing lattice, for which there are no abstract limits1 . Cellular automata remained as a shady sub-field of computation theory for several decades. That completely changed in 1970, when Game of Life was reported by Gardener [1970] in the Scientific American magazine. Created by John Conway, this model primed for simplicity and elegance, with only two states and three rules, but still capable of yielding very complex behaviour. Conway [1971] published the theoretical background to the game the following year and its popularity grew rapidly, 1 With 29 different states and 29 transition rules, von Neumann’s model uses more than 6 000 automaton cells and takes 63 thousand million time steps to fully replicate. As late as the 1990s this would still take several years to produce with a personal computer.

3

Figure 1.1: Detail of John von Neumann’s self replicating automaton; courtesy of Will Stevens.

even justifying a dedicated newsletter for some years [McIntosh, 1990]. Although growing in popularity during the 1970s, cellular automata were still more of a mathematical curiosity than anything else. Stephen Wolfram would be the first to thoroughly analyse, not just a singular automaton, but a wide range of different single dimension automata. From his observations he proposed a classification framework that is still in use today, reflecting different automata behaviours: Class I – Constant Field – automata that converge to fixed patterns (i.e. ”dead”); Class II – Isolated Periodic Structures – those that converge to patterns that repeat cyclically; Class III – Uniformly Chaotic Fields – exhibit aperiodic patterns that do not converge; Class IV – Isolated Complex Structures – exhibit complicated internal behaviour, with localised patterns showing very long cycles or eventually ”dying”. Wolfram’s classes introduced concepts such as Complexity and helped shaping the field into something more than a curiosity. During the early 1980s Wolfram and his colleagues published a series of articles that formalised important aspects of cellular automata theory, especially its parallel with language theory. This work was reunited in the seminal book ”Theory and Application of Cellular Automata” [Wolfram, 1986]. With more powerful computers and improved graphical technology available, applications to real life problems started appearing. Wolfram himself proposed the usage of cellular automata applied to non-linear equations describing chemical reactions [Wolfram, 1986]. In this field, several studies were conducted to reproduce the Navier−Stokes equation for gaseous matter behaviour, something eventually achieved with an hexagonal lattice automaton [Frisch et al., 1986]. The first applications in GIS also emerged around this time, specifically with proposals for digital image processing [Kendall Jr. and Duff, 1984]. These earlier simulation models captured the world in a discrete form, with space uniformly divided into fixed dimension cells. Precisely this same concept is used in GIS to capture and represent spatially continuous variables, the commonly known raster format. This coincidence was noted early on during the development of the GIS field, notably by Itami [1988]. The late 1980s witnessed an evolution in a parallel field that would have important impacts on spatial simulation. Up to that time artificial intelligence had been studied around centralised computing 4

mechanisms performing complex tasks. This concept started evolving towards decentralised realisations of intelligence, relying on a relatively large number of small computational units [Hern, 1988]. Originally called ”bots”, in the 1990s they would be popularised as ”agents”. Agents largely benefited from developments in computer science, namely the emergence of OO programming. In fact, Agent-based systems would play an important role in the advancement of Computer Science throughout the 1990s [Silva et al., 2001]. The agent concept fitted perfectly in this new paradigm, an object with a set of properties that stores its state and methods determining how it reacts to stimuli. In contrast, cellular automata had been largely treated in imperative or functional programming environments. For spatial simulation, agents provided movement in a rather explicit way: entities that reasoned upon its neighbourhood and internal state and choose how to move accordingly. The 1990s saw the real take off, with spatial simulation penetrating fields of research outside the specific domain of GIS: Economics, Biology, Ecology, Urban Planning, Transport. This was possible largely due to the appearance of another important asset: the spatial simulation code library. Programmed with Objective-C, Swarm [Minar et al., 1996] was the first of these libraries to gain wide popularity; it provided classes to help developing simulation models and tools to facilitate its graphical rendition. In recent years ever more specific tools have been made available, especially with the progressive commitment from software vendors. Many are easy to use pre-programmed models that need only to be parametrised by the user to conduct a simulation.

1.3

Tool Support Spectrum

Spatial simulation is possibly the most complex of the several spatial analysis techniques. Whereas a simple statistical or mathematical trend analysis may be predictive enough for other kinds of data, it is insufficient with spatial data due to its multi-dimensionality and heterogeneity. Furthermore, these characteristics of spatial data tend also to result in highly specific simulation models, only usable within the particular application domain. Therefore, most spatial simulation models are developed ad hoc by the end-user organisation. Modern GIS software packages, such as QGis, gvSIG or ArcGIS, continue largely lacking easy to use tools dedicated to spatial simulation. In a spatial simulation model implemented with a general purpose programming language the majority of the instructions coded are extraneous to spatial dynamics concepts. Besides implementing the model, the program must control the flow of execution, manage system resources, and manipulate data structures. Burdening model development with these tasks can lead to several problems [Fall and Fall, 2001]: (i) difficulties verifying the correct model implementation by the program; (ii) limited model generality due to effortful modification and/or adaptation; (iii) difficulty comparing computer models, usually restricted to their inputs and outputs [Olde and Wassen, 1997]; (iv) problematic integration with other models or tools (e.g. GIS or visualisation packages), often limited to the exchange of output files. Implementing a model from scratch with a bare programming language is possibly the most costly option in spatial simulation.

5

Beyond general purpose programming languages, a spectrum of spatial simulation tools can be devised, sketched in Figure 1.2, ranging from those that present support at Program-level, closer to the programming language, to those that operate at Model-level, closer to the conceptual model that represents spatial dynamics [Fall and Fall, 2001, de Sousa and Silva, 2011b]. For each of these categories there is a set of advantages and drawbacks that must be carefully weighted before choosing a particular tool. These two types of tools are detailed in the following sub-sections. GeneralSwarm RePAST purpose Programming SimScript MASON Languages Program-level

NetLogo MOBIDYC GAML

Ocelet

SELES

Domain Specific Languages

General purpose Wide application Higher development cost Imperative implementation

TELSA

LANDIS

OBEUS

SLEUTH

Parametrisable pre-programmed models

Model-level

Specific purpose Built-in assumptions Fast development Declarative implementation

Figure 1.2: The spatial simulation tools spectrum (adapted from Fall and Fall [2001] ).

1.3.1

Program-level Tools

Program-level support tools extend the facilities available in general-purpose programming languages, usually providing useful software libraries for the development of specific classes of models. This approach substantially reduces coding time and can increase program reliability. Higher-level code, usually in a general-purpose OO programming language, specifies how objects are used to produce the desired model behaviour. These tools are referred by different but concurrent names: code packages, code libraries or tool-kits. The main advantage of this type of tools is the encapsulation of the model from functionality not directly related to spatial dynamics. These include: graphical display, data input and output, scheduling, statistical data collection, and beyond; for each of these a plethora of functions is provided in the form of a code library. The improvements are two fold: first it relieves the modeller from banal programming tasks, allowing a higher focus on dynamics; secondly, it produces easier to read and leaner code, for much complexity is encapsulated and standardised by the code library. On the downside, these tools require an extra learning effort for their proper use. Beyond demanding strong knowledge on the base programming language [Tobias and Hofmann, 2004], a modeller wishing to use one of these tools must learn to some detail the behaviour of at least part of the functions, objects and methods provided by the tool-kit. Samuelson and Macal [2006] note that the full understanding of one of these code libraries is something achievable only with several months of practice. Moreover, Benenson and Torrens [2004] suggest that with denser libraries, programmers can eventually run into some discomfort with conflicting or incompatible functionality that is only found at later development stages. These disadvantages have been mitigated to some extent with the emergence of user communities that share experiences and provide informal assistance, and also by opening and sharing the tool-kit source code.

6

De Smith et al. [2015] report that today more than one hundred of these tool-kits are available worldwide. The most popular still include Swarm [Berryman, 2008], and the Java based libraries MASON [Luke et al., 2005] and RePAST [Collier et al., 2003].

1.3.2

Model-level Tools

Model-level support tools allow the usage of spatial simulation models without requiring programming skills. They are pre-programmed models, designed for specific application fields that can be parametrised by the user. The larger the number of parameters the user can set and update, the larger its flexibility. They greatly shorten the time from model conceptualisation to simulation and provide fairly straightforward mechanisms for model tuning. However, they invariably constrain the spatial analyst to a specific application domain and dynamics framework. There are various popular examples in different application fields of this kind of tools. OBEUS [Benenson and Torrens, 2004] is a tool aimed at Urban Planning/Management, based on the theory of Geographic Automata Systems. It focuses on spatial relationships between different urban objects and their evolution with time, through migration and adaptation. LANDIS [Mladenoff, 2004] is a simulation model capable of simulating forest landscape dynamics in both space and time. Conceived still in the 1990s, LANDIS has been used in forest management applications around the world. TELSA [Merzenich and Frid, 2005] is a highly-specialised simulation model directed at ecosystems management, providing insight to the outcome of different management scenarios and the consequences of landscape disturbances (fire, plagues, etc). SLEUTH [Clarke et al., 1997] is a popular urban sprawl simulation model developed in the 1990s that has been applied successfully to different cities in the world [Yi and He, 2009]. Model-level support tools tend to be quite specific, much of the model behaviour and assumptions are hidden in the program and may not be made explicit or modified; their use in other application fields is largely impossible. The analyst can in fact dispense programming skills using this kind of tools but becomes constrained to a specific field and overall simulation behaviour. They also tend to narrow the interaction with geo-referenced data, by imposing certain formats or in some cases by lacking output functionality. Evolution or generalisation of these tools can sometimes become too expensive and fate them to extinction. Traditionally, they take advantage of market niches providing for the needs of a specific and restricted group of users, thus the commercial nature of many of them. Community support is usually weak or non-existent; more often support is a paid service.

1.3.3

Domain Specific Languages

Somewhere in the middle of this spectrum lay the Domain Specific Languages (DSLs). These languages attempt to tackle the disadvantages found at each extreme of the tool support spectrum, while conserving most advantages of both. However, these DSLs have remained restricted to textual languages, retaining important drawbacks of Program-level tools. A broader review of these previous spatial simulation DSLs and their shortcomings is provided in Section 2.3.

7

1.4

Problems

Analysts working with spatial data are either trained in GIS related disciplines – Geography, Cartography, Geodesy – or in the scientific domains of application, such as Biology, Economics or Environmental Science. Even higher education programmes on these fields largely lack training in programming, particularly on OO development. Spatial analysts thus generally lack the knowledge and practice of trained programmers, being unable to use the most common spatial simulation tools. The involvement of programmers in spatial simulation analysis is frequent, demanding a further communication step in the process of implementing or realising a model concept. On the other hand, the option for pre-compiled Model-level tools also imposes its dose of burdens. It is often hard or impossible to verify the correct implementation of this sort of models since most are closed source and commercial tools. Experiments with different behaviours or the input of alternative spatial information is impossible, which can lead the analysis to conform to the model, whereas the opposite would be correct approach. Regarding model descriptions, if a simulation model can only be fully described by the implementing source code, it then becomes unreadable to spatial analysts and stakeholders lacking training in programming. There have been attempts to classify Agent-based models [Silva et al., 2001] and to formalise model descriptions [Grimm et al., 2006]; but anything resembling an overreaching approach is yet to emerge [Muller et al., 2014]. ¨ Furthermore, source code specificities, such as data input/output, syntactic structure or programming paradigm, cast a layer of obfuscation that makes hard the comparison of different models using source code directly. In reality, there are numerous concepts common to any spatial simulation, such as the succession of time, spatial variables, agents, behaviours or spatial location. For example, a wildfire simulation model can appear entirely different from a land use simulation model, simply because different tools were used to implement each. Without some sort of common descriptive lexicon, models are harder to compare and communicate, even those produced for the same application domain. The DSLs tried in this field have been mostly constrained to fourth generation programming paradigms, and none has achieved the popularity of traditional Program-level tools. They may also pose compromises with data interoperability. Many different spatial data file formats exist today, with some DSL and Model-level tools limiting support to those issued by a single software vendor. With almost three decades of history in the GIS field, spatial simulation is still inaccessible to typical spatial analysts. However, potential exists for a wider adoption of spatial simulation techniques, provided new tools with different approaches to make model development more accessible to nonprogrammers. This evolution is likely to require the emergence of a common ontological lexicon for model description.

8

In summary, this research identifies the following problems: Prob. 1: most spatial simulation tools require specialised training in programming. Prob. 2: those tools that do not require such knowledge are narrow scoped and tend to compromise GIS interoperability. Prob. 3: an integrated approach to the description, documentation and communication of spatial simulation models is largely lacking [Muller et al., 2014]. ¨

1.5

Thesis Statement

The thesis statement that this work aims to verify is the following: The development of spatial simulation models can be improved and made more accessible to common spatial analysts with a graphical domain specific language and accompanying development framework, able to automatically generate read-to-run implementations from models developed with this language. This language is named DSL3S – Domain Specific Language for Spatial Simulation Scenarios. DSL3S is in essence an application of the Model-Driven Development (MDD) methodology to the spatial simulation domain.

1.6

Research Goals

The core aim of DSL3S is to provide spatial analysts with means to rapidly prototype spatial simulation models with graphical diagrams, built with parametrisable elements representing clear simulation concepts. Such models can then be feed to a model-to-code transformation facility to produce a ready-to-run implementation based on a popular program-level code library. Considering the output of previous research on graphical DSL[Clark and Muller, 2012, ´ 2012, Mohagheghi et al., 2013], the research goals aimed with this approach are: Paige and Varro, Goal 1: Faster development and prototyping, allowing spatial analysts to rapidly test their heuristic or conceptual spatial dynamics applied to the problem at cause, early on identifying the most suitable paths of thinking. Goal 2: Reduction of development errors, through the reduction, or altogether elimination, of coding activities. Goal 3: Increased model readability, with models described through graphical diagrams, an expressive representation easier to interpret and closer to natural speech. Goal 4: Improved GIS interoperability, guaranteed by the usage of the most advanced code libraries at implementation level, providing support for a wide array of spatial data formats.

9

Goal 5: One model, several implementations, since model-to-code transformations may be developed for different target Program-level tools and/or computing platforms, thus allowing a single model to be experimented with different implementations.

1.7

Research Methodology

The issues identified in Section 1.4 are approached from an engineering perspective, considering the set of goals proposed in Section 1.6. In this light, available technology is employed to build a prototype that allows the development and validation of spatial simulation models by spatial analysts. This research makes use of the following techniques: (i) analysis of the state-of-the-art, identifying the difficulties posed by existing spatial simulation DSLs; (ii) conceptual solution, with the proposal of a DSL capturing relevant features and concepts in the research context; (iii) prototype construction, providing means for model development using a domain specific language; (iv) production of best practices with user guides and other support materials; (v) validation through use cases and user evaluation. This research work follows the Action Research method [O’Brien, 2001], that proposes an approach diverse to the classical scientific pattern of hypothesis testing. Action Research is rather an iterative process whereby current knowledge is systematically questioned and re-evaluated. Young et al. [2010] detail this iterative framework into a cycle composed by four stages: (i) plan - determining the focus of the cycle and how to set it in practice; (ii) action - implementing the actions set out in the plan; (iii) observe - recording the relevant outcomes from the actions implemented; and (iv) reflect revising current knowledge in light of the outcomes. Spatial simulation has been applied to different contexts, among which the taxonomy related to spatial dynamics can be considerably varied. Undertaking all these contexts simultaneously would have likely been impractical. Therefore, the iterative framework precluded by the Action Research method suited this work well, with the successive tackling of different spatial simulation application contexts. This way different terms and classifications were progressively consolidated into general spatial simulation concepts, improving the language and the development framework at each cycle. The development of the prototype starts with a first cycle to produce an embryonic language, including raw concepts of spatial simulation found in the literature. Iteratively, the language and the prototype model-to-code transformations are applied to different models. At each cycle an example application is selected for implementation (e.g. Predator-Prey, Urban Sprawl), allowing the identification of language and transformation shortcomings in four successive steps. The first step consists on the assessment of language adequacy for the application sub-domain in question (plan stage). If necessary, the language is extended or refined to guarantee the correct modelling of the example. On a second step (corresponding to the action stage) are developed the model-to-code transformations; in case new language elements have to be introduced, new transformation templates must be necessarily created. In other cases, refinements and improvements to the transformation outputs may be required. On a third step, the improved version of the language and transformations is verified against

10

all the models previously iterated (observe stage). In case retrospective errors are introduced to previous models by the modifications in the current cycle, the transformations must be consolidated again (reflect stage). A new cycle can then commence, with the tackling of a different example application. Figure 1.3 resumes this development process.

Figure 1.3: The prototype development cycle employed in this work.

The example applications used to develop the language were selected from reference models described in the scientific literature, thus constituting a set of case-studies. The resulting DSL3S models provide in themselves a validation asset, by showcasing the application of the language to these classical spatial dynamics problems. On a final research iteration the language and the prototype were subject to an evaluation by peers and potential users. In various test sessions, experts and professionals in GIS and related fields were exposed for a first time to DSL3S, providing usability and adequacy feedback through an evaluation questionnaire. Additionally, the results of this research work were submitted to international journals and conferences on GIS, simulation and computer science, gaining approval from the scientific community. Section 1.8 details the scientific output resulting from this work.

11

1.8

Original Contributions

A number of peer-reviewed articles originated from this thesis. They were used to assess the validity of the approach proposed and later on to validate the end results. Below, these articles are broken down in Conference communications and Journal publications.

1.8.1

Conference communications

de Sousa, L. e Silva, A. R.. DSL3S - Uma Linguagem Espec´ıfica do Dom´ınio para ˜ Espacial. VII CNCG, Maio 2011. Simulaçao This article presented for the first time the concept of a graphical DSL for spatial simulation within the Portuguese scientific GIS community. It gathered the first impressions on the concept from spatial analysis experts and scientists in the field. de Sousa, L. and Silva, A. R.. Capturing Spatial Simulation specifics in Geographic Information Systems with a UML Profile. INForum 2011, September 2011. The language was again presented at a more refined stage to a broader and international community. This communication gathered the first formal and informal feedback from computer scientists. de Sousa, L. and Silva, A.R., Review of Spatial Simulation Tools for Geographic Information Systems. SIMUL 2011, October 2011. A thorough review of spatial simulation tools was communicated, outlining in detail existing drawbacks that pose relevant changes to their adoption. This communication provided further ground for a new approach to the domain. de Sousa, L. and Silva, A.R.. Preliminary Design and Implementation of DSL3S – a Domain Specific Language for Spatial Simulation Scenarios. Proceedings of CAMUSS, Oporto, Portugal, November 2012. The implementing framework was publicly presented for the first time in this article. The usage of the language and the model-to-code transformation where demonstrated to the audience. This conference provided the opportunity to discuss DSL3S with leading scientists in the field, such as Prof. Itzhak Benenson, Prof. Michael Batty and Prof. Roger White. de Sousa, L. and Silva, A.R.. A Domain Specific Language for Spatial Simulation Scenarios(DSL3S): Introduction and Tool Support. SAC’15. April 13-17, 2015. This communication presented the DSL3S framework at a fully refined stage. Further feedback was gathered from the computer science community, and experiences exchanged with researchers applying similar methodologies in other domains. 12

de Sousa, L. and Silva, A.R.. Showcasing a Domain Specific Language for Spatial Simulation Scenarios with case studies. 55th ERSA Congress, August 25-29, 2015. In this article the DSL3S models for three classical spatial simulation applications were presented. The communication was attended by scientists employing simulation methodologies in different fields that provided general positive feedback and showed interest in trying DSL3S.

1.8.2

Journal Articles

Muller, B., Balbi, S., Buchmann, C. M., de Sousa, L., Dressler, G., Groeneveld, J., Klassert, ¨ C. J., Le, Q. B., Millington, J. D. A., Nolzen, H., Parker, D. C., Polhill, J. G., Schluter, M., Schulze, J., Schwarz, N., Sun, Z., Patrick, T., and Weise, H.. Standardised and transparent model descriptions for agent-based models: Current status and prospects. Environmental Modelling & Software, 55(0):156 – 163, 2014. A position article resulting from a special session on agent-based modelling at the International Congress on Environmental Modelling and Software, held in Liepzig in 2012. This article expresses the need for a unified methodology to describe and communicate agent-based models, targeting primarily peers, but also stakeholders. Various avenues for such a methodology are explored, including the specialisation of the Unified Modelling Language (UML). de Sousa, L. and da Silva, A.R.. A Domain Specific Language for Spatial Simulation Scenarios. GeoInformatica, 19 (5), 2015. This article provides a thorough account of the DSL3S language and its accompanying framework. The language abstract and concrete syntaxes and semantics are detailed, as so the technologies on which the framework was developed. A series of use cases is provided, exemplifying the usage of DSL3S in different scenarios. Finally, the language is compared with previous spatial simulation DSLs.

13

1.9

Thesis Outline

The remainder of this document is outlined as follows: Chapter 2 consolidates various notions in the domain to proceed with a review of related work. It starts by unravelling a number of key concepts in spatial simulation. It then provides introductory concepts on cellular automata and agent-based modelling, noting similarities and differences between the two techniques. A review follows on a series of previous DSLs developed for spatial simulation. The chapter closes with an account of the MDD methodology. Chapter 3 provides a general overview of spatial simulation as a spatial analysis process, detailing core activities, the actors involved and the outcoming assets. The impact of DSL3S on this process is then explained, noting the improvements and simplifications expected from its employment. Chapter 4 presents the abstract and concrete syntaxes and the structural semantics of DSL3S. It also proposes a structure of views for model organisation and a set of icons for increased graphical expression. Chapter 5 reviews the technologies used to develop the prototype framework implementing DSL3S and then outlines the various support materials produced to facilitate its usage. Chapter 6 details the processes employed to evaluate DSL3S. In first place is presented the application of the language to various classical use cases of spatial simulation. Afterwards are reported the results from the various test sessions where user experience was collected. The chapter closes with a comparison of DSL3S with previous DSLs in the field. Chapter 7 summarises the thesis and its results and discusses future work.

14

2 RelatedWork

Contents 2.1 2.2 2.3 2.4

Concepts of Spatial Simulation . . . . . . . Concepts of Cellular Automata and Agents Previous DSLs for Spatial Simulation . . . . Model-Driven Development . . . . . . . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

16 17 20 26

15

This chapter reviews previous work related to this thesis. Beforehand, various concepts are consolidated to clarify the matters discussed. Section 2.1 deepens notions specific to spatial simulation, essential to the description of associated activities and artefacts. Section 2.2 introduces core concepts of spatial simulation techniques, noting the differences between cellular automata and agentbased methodologies. Section 2.3 starts the related work review, going through a series of DSLs previously developed for this domain. Closing, Section 2.4 briefly reviews the MDD methodology and outlines the improvements it may bring about when applied to spatial simulation.

2.1

Concepts of Spatial Simulation

This section elaborates various concepts related to spatial simulation previously introduced. These descriptions consolidate their usage throughout the remainder of this document. Spatial Analysis is a discipline in which different techniques and algorithms are applied to the study of spatial data [O’Sullivan and Unwin, 2010]. The broad goal in this discipline is to produce or synthesise relevant information from existing data. This may be information on location, spatial autocorrelation, scaling, spatial patterns, spatial processes and many more. Usually, this new information is relevant in decision making, managing or planning contexts [de Smith et al., 2015]. Spatial analysis is most commonly performed on geo-referenced data but may be performed on other geometrical data and even on astronomic features. The number of techniques employed in spatial analysis is rather vast, with some still in early stages of maturity. Spatial simulation is one of these techniques. Spatial analysis follows a particular process – a thorough succession of tasks – where different actors intervene, producing various artefacts; this process is detailed in Section 3.1. Spatial Analyst is the principal actor in the spatial analysis process, usually an individual trained in Geography or related disciplines that possesses knowledge on a vast number of spatial analysis techniques. The spatial analyst is in first place able to chose the most appropriate technique to tackle a particular problem. The analyst is then able to correctly employ these techniques and present results in simplified and transparent ways. In the particular case of spatial simulation, the spatial analyst might not entirely master the techniques necessary to its application if programming is involved. During the analysis process, a simulation Model is produced [Law, 2007], capturing the elements of change – the spatial dynamics – at a conceptual level. It may be a simple textual description of how spatial features evolve and behave; it can also take the form of a set of rules (common with cellular automata), a set of logical formulas or of mathematical functions. An example in a land use model is a rule determining that if more than half of the neighbourhood of a cell are crop-land, it can no longer support a forest. The model should provide an understandable formulation to all actors involved of which spatial features evolve – and how – during simulation. A model Implementation is a computer program that executes a particular model, encoding all its behaviours, rules, logic and functions. In case a Program-level tool is used, this implementation is comprised by the source code and executables produced. A Model-level tool is an implementation itself, but for a particular analysis process it may also add the settings of each individual parameter.

16

A Simulation is an execution (or run) of a model implementation on a particular set of data inputs and start conditions [Clark, 1990, Epstein and Axtell, 1996]. During simulation the spatial dynamics encoded by the model implementation are successively applied to the input data and features for an established number of time steps. The state of the data and features at the end of execution are the simulation Result(s). The set of input data and initial conditions are also referred as simulation Configuration.

2.2

Concepts of Cellular Automata and Agents

The following subsections introduce a series of concepts that detail the workings of cellular automata and agents. There is a number of characteristics that distinctively define both of these techniques, even though their realisation may have alternative nuances between different implementations.

2.2.1

Cellular Automata

In computer science an Automaton is any element capable of processing inputs from its surroundings, altering its characteristics or state, according to a set of rules. The term Cellular Automata specifically refers to the class of computer models that descend from the von Neumann concept referred in Section 1.2, comprised by a gridded domain, with each automaton occupying a cell. De Smith et al. [2015] describe cellular automata with five building blocks: • Spatial framework is a discrete partition of space, usually a regular grid where cells may be squares, hexagons, rectangles or triangles. It is also possible to run cellular automata on irregular lattices, such as those defined by Voronoi polygons. In spatial analysis, cellular automata are mostly run on two-dimensional grids, resultant from raster data sets; however, models in one or three dimensions can be useful too. • State variables are attributes that describe a particular automaton, that may be defined by any sort of data type. In the spatial domain they usually describe the landscape or other spatial features (e.g. land use classification, pollution levels, meteorological conditions). • Neighbourhood structures define how each automaton interacts with its surroundings. The neighbourhood is the set of cells from which the automaton receives inputs to determine its state. On a traditional square grid two types of neighbourhoods are most commonly used: the eight cells with which the central cell shares a vertex or a boundary line (Moore neighbourhood), or the four cells with which it shares only a boundary line (von Neumann). Figure 2.1 shows three neighbourhood structures; though many other schemes are possible, the first two are by far those most commonly used, for their local focus and symmetry. Other concepts of neighbourhood may be employed, with the inclusion of distance weighting or lagged neighbourhoods. However, authors such as Batty [2007] classify these later neighbourhood schemes outside the traditional realm of cellular automata. 17

• Transition rules determine how automaton state variables evolve, both with the passage of time and with changes in its neighbourhood. Although very simple rules are enough for complex behaviours to emerge, as in the Game of Life, far more refined rules have been used, including probabilities, that aim at capturing as closely as possible the dynamics of the phenomena in study. • Time is defined as a discrete set of steps of equal abstract length. At each step all automata are evaluated; from the present state of an automaton and the states of its neighbourhood a new state is computed. This process takes place synchronously for each automaton, meaning that when calculating the new state, the neighbourhood is accessed strictly in the present time step. When a new state is obtained for all the automata, they become the present automata states; the time is then advanced and the process starts once again.

Figure 2.1: Three neighbourhood types, from left to right: von Neumann, Moore, and Extended Moore.

The set of all automata in a particular simulation is commonly referred as World. The set of all states of a world at a specific time step is called a Configuration. In a more formal way, the application of rules can be considered as a function that takes a configuration as argument and returns a new configuration. Cellular automata have been used extensively to model spatial dynamics; both its ancient conception and its simplicity have made them popular through time. Land cover change is possibly the field of research where these tools have been used more profusely [Messina and Walsh, 2000] [Torrens, 2000]; but also to study Wildfires [Li and Magill, 2001], Urban Dynamics [Batty, 2007] and Biology [Ermentrout and Edelstein-Keshet, 1993]. Figure 2.2 shows the output from a wild fire simulation, where fire spreads to adjacent healthy vegetation cells according to available biomass and wind direction; green cells portray healthy vegetation and red cells burning vegetation.

2.2.2

Agents

Agent-based models are built with multiple autonomous entities that interact with each other and/or the surrounding environment [Weiss, 1999]. These entities can be as simple as basic elements reactive to external stimuli or as complex as goal pursuing agents, even with learning capabilities. In a single simulation several different agent types can exist. As with cellular automata, agent-based simulations are paced by an abstract clock setting the moments in time when agents should act and/or send stimuli to other agents [Epstein and Axtell, 1996]. In the spatial simulation domian, agents exist in a space defined by the enclosing environment variables. They are usually defined as mobile entities, however, static flavours can also exist, as so 18

Figure 2.2: A Wildfire simulation with cellular automata (from Li and Magil [2001]).

agents that act on non-geographic planes, but impact certain sectors of space (e.g. a farmer in a land use simulation). Simulation of social behaviour was the root of agent-based modelling, which became the prime computer tool for the study of emergent behaviour [Wolfram, 2002]. In these simulations, the evolution of an entire society or system can be observed and synthesised from agent level rules and stimuli. The application of agent-based modelling has gone well beyond the spatial context, tackling a vast number of issues [Samuelson and Macal, 2006]. Early spatial agent-based models were sometimes implemented as cellular automata, with agents occupying a single cell at a time and moving according to specific neighbourhood rules [Dewdney, 1988]. This way the spatial interaction between agents (e.g. proximity stimuli) was greatly simplified. Modern implementations more often provide agents with all degrees of freedom regarding movement, effectively rendering the simulation space continuous. This later aspect is one of the most important characteristics setting agent-based simulations apart from cellular automata. Franklin and Grasser [1997] outlined a set of properties that have been adopted as a general definition of what identifies an agent: • Autonomous – an agent controls its own actions, abstract from any centralised cognoscence or processing. • Continuous – it is underpinned by a process that runs continuously. • Reactive – responds in timely fashion to stimuli from the surrounding environment. • Proactive – or goal-oriented, purposeful. Beyond reacting to external stimuli, an agent acts on ˆ a pre-set (or evolving – see ahead) goal, it has a raison d’etre. • Mobile – able to transport itself from one point in space to another. • Sociable – an agent is aware of being part of a society and is able to communicate and/or interact with other agents. 19

• Character – possesses personality and emotional state. • Adaptive – builds experience and changes its behaviour accordingly. In practice most simulations use agents that do not present all the characteristics outlined above. Even so, the first three – Autonomy, Continuity and Reactivity – can be postulated as indispensable for an agent to be; in spatial simulation, Mobility may add to these. The blurry boundary remaining between the concepts of cellular automata and agents has lead to the latter nomenclature to dominate, being used to refer to most spatial simulations, especially with the advent of OO programming, a technique more suited to the agent concept [Batty, 2007]. Many have been the fields of application of agent-based models, namely: Politics and Warfare [Ilachinski, 1997] [Lustick, 2002], Biology [Eidelson and Lustick, 2004] [Krawczyk et al., 2005], Economics [Axtell, 1999], Traffic [Nagel and Rasmussen, 1994], Archaeology [Gumerman et al., 2003] and more; even Terrorism has been studied with these methods [North et al., 2004]. Figure 2.3 portraits a urban disaster simulation. Red, green and blue dots represent emergency agents: police, fire-fighters and ambulances; black dots represent civilians and crosses are road blocks. Damaged buildings appear coloured.

Figure 2.3: A disaster simulation in a city.

2.3

Previous DSLs for Spatial Simulation

Some of the problems outlined in Section 1.4, in particular the difficulties faced by spatial analysts in implmenting spatial simulation models, have long been identified [Fall and Fall, 2001]. Realising the 20

gap between Program-level and Model-level tools, various authors have proposed DSLs for spatial simulation. These languages attempt to bring the implementation closer to the model, providing constructs nearer to natural speech and hiding some lower level programming aspects. They provide the spatial analyst with a more approachable implementation framework, while avoiding the shortcomings of Model-level tools. In this section some of these previous DSLs are briefly described, noting their most important constructs and relevant implementation aspects. A wider review of spatial simulation tools can be found in [de Sousa and Silva, 2011b].

2.3.1

NetLogo

StarLogo started as a specialisation of the Logo functional programming language, directed at agent-based simulations. It was an educational project at the MIT to help students exploring emergent behaviour. StarLogo was progressively transformed into a multi-platform tool with the adoption of Java as execution environment; eventually it evolved into a spin-off named NetLogo. The lexicon of NetLogo is composed of four main concepts, all different kinds of agents: (i) turtle agent capable of moving across the simulation space; (ii) patch - a static subdivision of the simulation space; (iii) link - a relation between two turtles; (iv) observer - a non-spatial agent capable of collecting data from, and provide data to, other agents. Agents can themselves contain variables to store data and can be grouped in agentsets. Both StarLogo and NetLogo are relatively easy to learn, especially when compared to Programlevel tools, dispensing the higher skills needed to use an OO language [Railsback et al., 2006]. An integrated text editor supports swift development and the exploration of model dynamics. A vast library of over 300 pre-built models has been gathered for education purposes1 , covering a wide range of disciplines. More recently an extension2 for spatial data input was made available, although entirely reliant on ESRI data formats for both rasters and vectors. Berryman [2008] reports that this extension requires advanced programming skills to master. Of the various DSLs attempted in this field, NetLogo seems to be the most popular, retaining a large number of users. In great measure this is due to its fast prototyping capabilities, to which the integrated text editor greatly contributes. However, readability issues common to traditional programming languages slowly emerge with larger and more complex models, especially if spatial data is involved.

2.3.2

SELES

The Spatially Explicit Landscape Event Simulator (SELES) is the product of a research project at the Simon Fraser University, a declarative DSL for Landscape Dynamics [Fall and Fall, 2001]. SELES was conceived to be used closely with GIS software, supporting a vast range of different raster formats 1 http://ccl.northwestern.edu/netlogo 2 http://ccl.northwestern.edu/netlogo/docs/gis.html

21

(most common in Land Use / Land Cover data) for landscape data input. SELES takes also as input a set of global variables and the declaration of several landscape events and agents. Landscape events describe the model dynamics, each requiring the declaration of a spatial domain and recurrence frequency. For each event is specified a spreading mechanism and how it affects its neighbourhood. Even though using keywords closer to the context of simulation, models coded with SELES are somewhat reminiscent of third generation languages, with distinct data and procedure environments, still leaving many usual coding activities to the user. It is a good example of a DSL that while dealing away with some of the complexity of traditional programming languages, achieves little in terms of abstraction. SELES is shipped with a dedicated code editor and a simulator that runs the model by interpreting the code files and reading in the spatial data. At run time the simulator displays the model in a graphical interface. Both these programs are available free of charge as closed executables for Microsoft operating systems.

2.3.3

MOBIDYC

MOBIDYC (Modelling Based on Individuals for the Dynamics of Communities) is an agentbased approach to the study of population dynamics, directed at the fields of Biology and Ecology [Ginot et al., 2002]. It was conceived to provide a tool accessible to non-programmers, particularly biologists. In essence, MOBIDYC is a Smalltalk code package, defining a set of simple primitives, such as environment, agent and state, plus a set of pre-defined behaviours. A model requires in first place the creation of agents and their respective states; behaviours are coded with primitive relations between the names of state variables, such as arithmetic operations. Observing agents that collect data can be added, but results are made available only in tabular format. Models developed with MOBIDYC can be quite fluid and easy to understand, if targeting Biology related problems; in other domains the semantics of the code can become harder to grasp. There is no explicit mechanism to interact with GIS software, MOBIDYC was conceived to run primarily on purely artificial spaces. The source code is open and free, but is dependent on VisualWorks, a commercial IDE. More recently the development of MOBIDYC has focused on specific menus in VisualWorks that produce code blocks for standard model components. The reliance on this IDE provides wide portability to MOBIDYC, running on Microsoft, Macintosh and Linux operating systems.

2.3.4

Ocelet

Ocelet is a declarative DSL for landscape dynamics aimed at tackling common difficulties in capturing space-time dynamics with traditional modelling techniques [Degenne et al., 2009]. It takes an unconventional approach to this field by mimicking the concept of service-oriented architecture, with model components interacting with each other through services. The developer disposes of five principal constructs to declare a model with Ocelet: (i) entity - a component that provides a set of services; (ii) service - communication port of an entity, accepting a 22

set of arguments and returning a set of results; (iii) relation - bonding entities through their services (when compatible); (iv) scenario - describing which relations within an entity have to be activated, and when; (v) datafacer - a device through which entities access data. Entity behaviour is coded with particular actions behind each service using mathematical expressions. The double paradigm of this language presents a novel approach to spatial simulation, but it is not entirely clear if it eases model understandability. Users lacking a background on computer science may find the service-oriented architecture alien and hard to frame with spatial simulation. On the other hand, the service-oriented paradigm provides a level of abstraction over the general purpose of an Ocelet model that is lacking in the other languages reviewed here. However, as the amount of code required to describe larger models expands, this abstraction slowly dilutes. The language is supported by two Eclipse plug-ins: a language editor and a code generator. The artefacts generated are Java classes that can be compiled and executed on multiple operating systems or platforms. Regarding interoperability, Ocelet only supports a restricted number of vector data file formats.

2.3.5

GAML

In recent years a consortium of French and Vietnamese research centres and universities has developed an agent-based modelling IDE called GAMA [Grignard et al., 2013]. It is conceived to support large models and to provide seamless integration with spatial data. This IDE interprets a textual DSL called GAML (GAMA Modelling Language). A model in GAML is declared in similar fashion to SELES: a structured file composed by sequences of statements that can either be declarative or imperative. The core concept of this language is Species, essentially an agent class, that underpins most other concepts. A GAML model is structured into four code sections: (i) Header - setting the model name and optionally importing other model files; (ii) Global species - declaring a special species called ”world agent” enclosing global properties of the model; (iii) Species and grids - where are declared classes of agents and grid topologies (discrete spatial variables); (iv) Experiments - special agents that carry out the execution of the model, they may be of two types: gui and batch. A species is defined as a set of attributes plus a set of actions and behaviours. Behaviours include: reflex - a sequence of statements that can be executed at each time step; init - a special form of reflex that is evaluated only once when the agent is created; task - a reflex with a weight associated that determines its execution priority in the scheduler; and state - determines if the agent should enter/leave a particular state at each time step. Beyond four primitive data types (bool, float, int and string), GAML supports several advanced features found in general purpose programming languages: loops, iterators and data structures such as lists, maps or matrices. Of the DSLs reviewed here GAML is possibly the most versatile, with a wider range of application, due to an extensive number of features and constructs, as Figure 2.4 shows. Eventually, it may come to build a relevant user community like NetLogo did. Nevertheless, like SELES, GAML still mimics in various ways early third generation languages (such as COBOL) with strict environments for specific

23

code sections. Mastering a language of this depth is naturally a lengthy process, presenting a relevant challenge for less experienced users. At this stage GAMA supports three spatial data file formats, the Shapefile vector specification issued by ESRI [ESRI, 1998], the ASCII raster format also specified by ESRI and OpenStreetMap vectors. However, full documentation is only available for the Shapefile format. GAMA is built on Eclipse, runs on Java and is released under an open source licence.

Figure 2.4: Hierarchy of basic types in GAML.

2.3.6

AML

A graphical DSL not conceived for spatial simulation, but worthy of mention, is the Agent Modelling Language (AML) [Trencansky and Cervenka, 2005]. It was developed for social dynamics and is reliant on the Model-Driven Architecture (MDA) infrastructure (vide Section 2.4), extending a wide range of UML meta-classes. Its concepts are organised hierarchically, through several levels of generalisation. At the top is the concept of semi-entity, an abstract element that can be of two types: behavioured or socialised; the former represents elements that can act on their environment, the later specifies elements that can form societies and participate in social relationships. The concrete building blocks of AML are entities, that can be of three types: (i) agents - capable of interactions, observations and autonomous behaviour; (ii) resources - physical or informational entities whose availability is constrained; and (ii) environments - logical or physical surroundings that determine under which conditions entities can exist and function. Three other main concepts model social dynamics: (i) structures - identifying societies and roles; (ii) behaviour - constructs for

24

communication, observation, reaction and services; and (iii) attitudes - describing individual agent drivers: needs, intentions, goals, beliefs. There is much more to AML, constructs to specify mental agent aspects and even concepts to describe model deployment and execution. No interpreter or model-to-code transformation infrastructure has ever been developed for AML and no applications could be found in the literature. It is possible that developing a full transformation for such a detailed language presents too much of a challenge. On the other hand, at the time AML was published, the tools on which transformations could be developed were far fewer and less mature than today. AML presents itself as a resource with great potential that is yet to be fulfilled.

2.3.7

Discussion

The DSLs reviewed in this section have remained restricted to textual paradigms, retaining important drawbacks of Program-level tools. They can ease model development and reduce the build-up time in prototyping, but do not fully avoid the need for programming skills. As with general purpose programming languages, the user has to understand the meaning of keywords and how to compose a coherent set of instructions or declarations into a specific model. In some cases (e.g. SELES), the final programme may be spread across various code files that must be correctly linked together. Apart from AML, these previous DSLs focus on providing a refined concrete syntax but still framed in older programming paradigms, emanating from declarative or functional languages. What is more, DSLs like GAML or Ocelet are so deep that may take the same lengths to master as ordinary Programlevel tools. Some of these DSLs were clearly developed for educational purposes, more as prototyping than analysis tools. Lack of GIS interoperability is an issue to some of them (most patent with MOBYDIC). The best of these languages in this regard – GAML – supports only three different geo-spatial formats. Platform or operating system dependency are also common issues; in reality, some of these languages are more limited in technical terms than in their ontological scope. Lacking any implementing infrastructure, AML is an acute case in this latter aspect. NetLogo appears to be the only one of these languages with a large enough user community, granting it regular employment and reference in peer-reviewed literature [Lytinen and Railsback, 2012]. GAML could possibly achieve such status, but is yet in an earlier phase of maturation. At the opposite end, interest seems to be waning on SELES and MOBIDYC. In spite of a few successful cases, there seems to be something amiss with these languages. None of them fully succeeds in abstracting the core activity of modelling from underlying technicalities. AML is the clear exception, but it is not specific to spatial simulation and is not implemented. It is apparent the need for a different approach to DSLs in this field.

25

2.4

Model-Driven Development

Modelling is an activity not only relevant in spatial simulation, in fact it plays an indispensable role in classical engineering disciplines. Modelling allows engineers to study large and complex systems from a higher level of abstraction [Atkinson and Kuhne, 2003]. In the still infant field of software en¨ gineering, modelling is yet to be widely adopted [Clark and Muller, 2012], although in many cases end users are requiring systems with a degree of complexity that goes well beyond the abilities of traditional software development tools [France and Rumpe, 2007]. Moreover, the integration with parallel disciplines: systems engineering, software engineering, control engineering, business process engineering, etc, can be greatly simplified with proper modelling tools [Giese and Henkler, 2006]. Model-Driven Engineering (MDE) is a software engineering paradigm, whereby modelling evolves from mere code documentation to the core activity of software development [Silva, 2015]. ModelDriven Development (MDD) is an analysis and designed centered approach to MDE, focused on the definition of modelling languages specifying a context in study [Atkinson and Kuhne, 2003]. The suc¨ cessful application of MDD requires a fundamental shift in the way software engineers use models, evolving from ad hoc complementary documents to the main focus of their work, thus relegating coding to the background. This is achieved through model-to-model and model-to-code transformations, and in some cases by direct model execution. With MDD, source code becomes a sub-product of the development process, shifting focus from how the system functions to what the system must do [Selic, 2003]. Figure 2.5 compares the traditional software engineering process with the MDD methodology. In traditional software engineering, models are created ad-hoc for a particular project during the system design activity. However, they must often be updated during the documentating activity, that takes place only after the system is at least partially coded. Moreover, the process flow can create a loop between the testing activity and the coding activities. The divergence of models from the final documentation and the code itself is therefore a common occurance. With MDD the development process is completely streamlined, with any resulting artefacts synchronised at any moment during development. And in MDD models make use of a DSL with consolidated domain specific concepts. The motivation behind MDD in the software development field is the gain of productivity and quality it can yield through automatic model-to-code transformation. But further advantages have been identified that justify its application to other domains. In first place the increase in understandability, especially since MDD mostly relies on graphical constructs, more expressive by nature, but also for dispensing the text parsing needed to comprehend source code [Selic, 2003]. Secondly, it promotes fast prototyping, by allowing model execution from a high level of abstraction, before much effort or resources are committed to development. This allows early model validation and later on, during the model refinement process, also to identify unintended or undesired model changes [Selic, 2008, Mohagheghi et al., 2013].

Furthermore, MDD offers the possibility to create user-

definable mappings, capturing domain specific concepts at an ontological (or meta-model) level, producing a lexicon of model constructs totally independent of particular code languages or specific soft-

26

Figure 2.5: Traditional software engineering process compared to the MDD methology.

ware platforms [Atkinson and Kuhne, 2003, France and Rumpe, 2007]. Finally, it is important to note ¨ that a successful application of MDD also brings forward an increase in interoperability, by offloading such technical concerns on the model-to-code transformation infrastructure, that can be adapted to match particular environments or platforms [Atkinson and Kuhne, 2003, France and Rumpe, 2007]. ¨ The Object Management Group (OMG) specified the Model-Driven Architecture (MDA)3 methodology as a concrete MDD approach. The UML 2.0 modelling language allows the extension of its core primitives (graphical elements, links, etc) through specialisation for different application domains [OMG, 2005]. This is achieved with the definition of a UML Profile, a collection of stereotypes, properties and constraints. Stereotypes are specialisations of existing UML model elements, defining new elements representing narrower abstractions. A semantically related set of stereotypes, specified by properties and restrictions, can thus be used to customise UML into a new specialised language dedicated to a certain domain. The Information Systems Group of INESC-ID4 , a research centre associated with the Insti´ tuto Superior Tecnico of the Universidade de Lisboa, has now close to a decade of experience in this field, applying MDD and Language Engineering techniques in difference contexts, particularly through the ProjectIT initiative [Silva et al., 2007a, Silva et al., 2007b, Saraiva and Silva, 2009, Ferreira and Silva, 2012, Ribeiro and Silva, 2014].

3 http://www.omg.org/mda/ 4 http://www.inesc-id.pt/

27

28

3 Proposed Approach

Contents 3.1 The Spatial Simulation Analysis Process . . . . . . . . . . . . . . . . . . . . . . . 3.2 Spatial Simulation with DSL3S . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

30 34

29

This chapter provides a broad view of what and how DSL3S is intended to change relative to the traditional praxis of spatial simulation. Section 3.1 describes spatial simulation in light of the general spatial analysis process in the GIS domain, identifying its main tasks, the actors involved and the artefacts produced. Section 3.2 lays down a general vision of how DSL3S is expected to be employed in the spatial analysis process and which tools are necessary for its adoption; a simplified spatial simulation analysis process is then produced from this vision.

3.1

The Spatial Simulation Analysis Process

Spatial simulation is a well defined process, a succession of activities or tasks, each with specific outcomes. This section reviews this process, describing the composing tasks, the artefacts produced and the participating roles.

3.1.1

Roles

Three essential actors participate in the spatial simulation analysis process: the Stakeholder, the Spatial Analyst and the Computer Scientist. Their roles in the process are described below. The process starts with stakeholders (e.g. domain experts, policy makers) identifying a particular problem for which there is no straightforward solution. Eventually, the necessity for spatial analysis becomes evident and spatial analysts are involved in the process. Together, stakeholders and spatial analysts start designing a plan to tackle the problem, identifying spatial simulation has the appropriate technique. At this time, a discussion of a possible model can already take place at conceptual level. Together, these actors should reach a common understanding of the problem and how it should be tackled; from then onwards the spatial analyst leads the process. At later stages during the analysis process, spatial analysts may require the involvement of computer scientists to implement the model, and later on tune or refine it. This is a likely scenario if a Program-level tool is used for the purpose. The requirement for this third type of actor is a relevant aspect in which spatial simulation differs from the traditional spatial analysis process. When the model implemented reaches satisfactory functionality, the spatial analyst takes again the lead, collecting statistics and synthesising results. When comprehensible and fulfilling results are gathered, spatial analysts convene again with stakeholders, and together they draw practical conclusions.

3.1.2

Tasks

The development of a spatial simulation broadly follows the same tasks of a classical spatial analysis process [de Smith et al., 2015], namely: (1) defining a Problem, (2) setting a Plan, (3) collecting Data, (4) performing the Analysis and (5) drawing a Conclusion. A spatial simulation differs mainly during the analysis task, with also a few nuances in the planning and data collection tasks. Figure 3.1 sketches this process detailing those activities specific to spatial simulation. A more detailed discussion follows.

30

Figure 3.1: The spatial simulation analysis process.

31

First, the Problem in this context corresponds to the identification of a clear question. It can be the need to understand the mechanisms that lead to observed changes in spatial features, and/or a requirement to investigate or predict how these features may evolve in the future. At the problem definition stage it should also become clear that spatial simulation is the correct technique to tackle the problem. Second, the Planning task involves the clear identification of which spatial features are to be considered, which sources may provide historical data, the tools on which to proceed the analysis and the methods used to assess its results. The spatial simulation model starts taking shape at this stage, with an initial conceptualisation of the spatial dynamics involved. Third, Data gathering goes through essentially the same sub-tasks as in a regular spatial analysis process: uniformisation of data sets to a single coordinate system and spatial resolution, guaranteeing spatial coincidence, etc. There is a relevant nuance however, in order to calibrate the model (see ahead) data must be collected for at least two different periods in time; without it the analysis task can not be performed for predictive purposes. Nevertheless, in certain contexts it might be justifiable to proceed the analysis on purely heuristic or empirical dynamics, to investigate the mechanisms of change. It is also important to note that the higher the spatial resolution used, the more sensible the model becomes to local factors. Fourth, the Analysis task, that is divided in five different sub-tasks: (1) Implementation, (2) Simulation on Historical Data, (3) Tuning, (4) Simulation on Analysis Period and (5) Collect Statistics. The analysis starts with the implementation of the model devised in the Planning task. Usually, it comprises the development of a computer program, but can alternatively be the parametrisation of a Model-level tool. If historical data is available, then this implementation is first run for these known period(s), assessing the quality of the model. Starting from the earliest of these data sets, the implementation is run the number of time steps required to reach a known subsequent data state. Spatial metrics, such as percentage of coincident cell states or Voronoi lattices dimensions around spatial features of interest, are normally used to compare the results of simulation with these historical data. In light of this assessment, the model and its implementation are tuned and again applied to the historical data until results are satisfactory enough. Eventually, the model converges to a formulation that satisfactorily mimics the known dynamics of change. The model implementation can thus be used for predictive purposes, and is then applied to the period of interest. Finally, statistics are collected from the results obtained. Typically, several simulations are run and statistics computed to robustly assess likely behaviour (especially when the simulation is feed by random data). Fifth, the Conclusion, where the results gathered are assessed to understand whether the initial problem was addressed or not. If results are not fully satisfactory the decision may be taken to return back to one of the previous tasks and repeat the process from there. This is but one way of telling what can be a rather intricate story. For instance, resolution adjustments can be introduced either during the Implementation or Tuning tasks, assessing the impact of

32

local dynamics in resulting patterns. This is an hallmark of spatial simulation, it is largely an open process, subject to the idiosyncratic specificities of each application domain.

3.1.3

Artefacts

The spatial analysis process starts with the definition of a Problem Statement. It must identify clearly the subject to investigate and what kind of results are required to address the problem. Ideally, the problem should be summed in a single sentence that is clear enough to both stakeholders and spatial analysts. Out of the Planning task an initial Model comes out [Law, 2007]. At least two essential elements must be identified: actors of change and environment variables. Actors of change are those elements expected to change during simulation, either by moving, replicating, spreading, dying, etc. They can be entities such as fire in a wildfire model, urban areas in an urban sprawl simulation, an infection in a biological simulation. Environment variables are all those data sets that though not changing by themselves, influence the simulation and may be somehow impacted by the actors of change. Examples can be biomass for a wildfire model, land use or slope for the urban sprawl example, host type for the biological simulation. At this stage the model is many times still described informally with a simple text; logical or mathematical formulations may add. In the Data Collection task Historical Data sets are gathered [Batty, 2007]. In order to thoroughly tune the model, data for at least two moments in time for each variable and actor of change must be collected. These data sets have then to be uniformised to the same coordinate system, spatial extent and the same spatial resolution. From the Implementation comes out a piece of software able to apply the model to a set of spatial inputs and parameters. In first place this implementation codes the rules of change, determining how each input element conditions the way actors of change behave. These can, for instance, take the form of attractors or repellents, e.g., representing a resource that is consumed, setting barriers to movement or development, etc. In an agent-based implementation, behaviour can be coded as a set of functions parametrised by environment variables, stimuli and internal state; weights are then assigned to each of these parameters in each function, either based on empirical knowledge or on purely heuristic grounds. Each simulation is characterised by an initial Configuration [Wuensche and Lesser, 1992], a set of input data, parameters and initial conditions. Keeping track of the configurations of each simulation can be particularly relevant later on, during the statistics collection task. A simulation produces a set of Results, essentially new spatial data sets characterising each element (variables and actors of change) at the end of the run. Several features can be relevant in a simulation result: the specific spatial positioning of changing elements, emerging spatial patterns, thresholds for chaotic behaviour and others. In models for which historical data exist, positioning may be the more relevant feature (e.g. urban sprawl); in models where the dynamics is developed empirically, large-scale patterns are more relevant (e.g. biological modelling). Statistics are collected usually on a series of simulations, in order to synthesise the re-

33

sults [de Smith et al., 2015]. Statistics allow to assess results over the effect of random initial conditions and random behaviour aspects during simulation. They may also disclose model sensitivity to inputs and/or behavioural rules.

3.2

Spatial Simulation with DSL3S

The broad goal of this thesis is to simplify the spatial simulation analysis process, providing spatial analysts with means to prototype spatial simulation models with graphical diagrams. These models may be parametrised and tuned to the specific application. Graphical models produced with DSL3S can be feed to a model-to-code transformation facility to produce ready-to-run model implementations based on one of the popular Program-level tools. From the results obtained with the implementation, analysts can return to the conceptual level and develop or refine the model further using graphical constructs, in an iterative process. In this fashion spatial analysts focus their work on modelling itself, abstract of concerns specific to programming, data input or platform dependencies. DSL3S is an application of the MDD philosophy to the specific field of spatial simulation, as an alternative way to address the problems identified in Section 1.4. By raising the level of abstraction at which development takes place, this approach can facilitate the communication between programmers and analysts and other stakeholders lacking training in programming [Mohagheghi et al., 2013]. It can also allow prototyping by non-programmers. By detaching model development from specific technologies, it can improve interoperability with spatial data, generating the appropriate code as needed. Lastly, it can lay the foundations for a standard language in the field, as successful efforts in parallel fields have proved, e.g., SysML1 or ModelicaML2 . This section describes the assets required to fulfil the goals aimed by DSL3S and how they modify the spatial simulation analysis process.

3.2.1

DSL3S Language and Framework

The realisation of the goals outlined heretofore rely on two essential elements: (i) a DSL for spatial simulation and (ii) a model development framework. The DSL3S Language is a meta-model, or ontology, synthesising relevant concepts of spatial simulation. It provides a high level lexicon for the development and description of spatial simulation models, completely independent of implementation aspects. This language is a totally abstract asset, in the sense that it exists above the computing (i.e. physical) domain. DSL3S is formalised through the MDA infrastructure, specialising meta-classes from UML 2.0 into spatial simulation specific stereotypes, that are gathered in a UML Profile. DSL3S takes spatial simulation as a branch of the wider spatial analysis discipline, where model inputs primarily originate from a GIS and whose outputs also have geo-referenced relevance. The language does not contemplate agents with the internal cognitive capacities that Franklin and 1 http://www.sysml.org/ 2 https://www.openmodelica.org/index.php/home/tools/134

34

Grasser [1997] classify as adaptive agents. Neither are considered explicit concepts of society, or societal interaction. All actors of change are assumed to exist in the space of simulation, thus forcefully being geographic entities. The language does not employ a distinction between agent-based models and cellular automata, aiming at a single approach to both schools of spatial simulation, hiding such implementation details from the spatial analyst. The DSL3S Framework is a software infrastructure providing the means for the actual usage of the language in the spatial simulation analysis process. In first place it makes the DSL3S UML Profile available in a graphical development environment. The analyst is able to apply DSL3S stereotypes to graphical elements in a UML diagram, that way gaining access to the simulation specific properties defined by the language. Secondly, it provides a model-to-code transformation mechanism that seamlessly produces a ready-to-run program implementing the DSL3S model. The framework also provides direct input points to spatial data with specific properties. The analyst needs only to provides the location of the input data set and the appropriate access mechanism(s) are created by the model-to-code transformation.

3.2.2

The Spatial Analysis Process with DSL3S

The usage of DSL3S and the accompanying framework introduces relevant changes to the spatial simulation analysis process described in Section 3.1. Upfront, the role performed by the computer scientist is no longer required, as coding activities are effectively eliminated. Fewer tasks are required, results are obtained faster and stakeholders can be involved deeper in the process. Figure 3.2 presents this simplified process. The differences start in the planning task. DSL3S provides a formal and expressive language to develop the model, thus dealing away with informal descriptions. This task should become more transparent and involving to stakeholders, with technical aspects absent from the discussion and a more approachable model. With the model-to-code transformation, trial simulations may already be run at this early stage in the process, still with the involvement of stakeholders. This may be helpful investigating some model aspects that may not require structured input data. The conceptual model resulting from this task is now bound to be formal, eliminating the ambiguities and chasms of informal model descriptions. After the data collection task, the process can move on directly to the simulation tasks, since the implementation task is no longer required. Uncertainties and errors caused by miscommunication between spatial analysts and computer scientists, or by the coding activity itself, are therefore eliminated. This way, the time span ranging from model development to first results is greatly reduced too. The tuning task remains, but now it only comprises the model itself; it becomes also a swifter and simpler activity. Since development is kept at the higher level of abstraction provided by the language, the opportunity is open for the involvement of stakeholders in this task too. Even though such involvement may not be justifiable in all situations, in some cases it may be a favourable aspect, as in participatory decision making.

35

Figure 3.2: The simplified spatial simulation analysis process using DSL3S.

36

Beyond these simplifications to the analysis process itself, a final outcome should be noted regarding model communication. A simulation model formalised with DSL3S is readily understandable by peers, be it spatial analysts or stakeholders. The communication of the entire process is thus considerably improved. Moreover, this ease of communication also increases the likelihood of model adoption by peers in other contexts or applications.

37

38

4 The DSL3S Language

Contents 4.1 4.2 4.3 4.4 4.5

Abstract Syntax . . . . . . . Concrete Syntax . . . . . . . Structural Semantics . . . . Model Organisation - Views Proposed Icons . . . . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

40 41 43 44 45

39

DSL3S is defined as a UML profile composed by a set of stereotypes enclosing various abstractions underpinning spatial simulation [de Sousa and Silva, 2011c, de Sousa and Silva, 2011a, de Sousa and Silva, 2012, de Sousa and Silva, 2016]. These stereotypes can be seen as the conceptual terms used when explaining a simulation with the terminology of this application domain, e.g., describing “fire” as an agent (because it is mobile and transforms the landscape) or “height” as a spatial variable (that has no innate activity but may influence the actions of certain agents). Together these stereotypes form a graphical language, that is formalised in the DSL3S UML profile. This UML profile allows the development of simulation models through the application of these stereotypes, the creation of correct relations between them and the parametrisation of their properties. This chapter details the DSL3S language; Section 4.1 presents its Abstract Syntax, Section 4.2 its Concrete Syntax and Section 4.3 lays out the structural semantics. Section 4.4 introduces guidelines related to model organisation. Section 4.5 proposes a set of icons to render models visually more expressive.

4.1

Abstract Syntax

Three basic constructs can be identified in most kinds of spatial simulation models: Spatial variables, Global variables and Animats (with the latter being indispensable). Spatial variables are spatial information layers that have some sort of impact on the dynamics of a simulation, e.g. slope that deters urban sprawl or biomass that feeds a wildfire. Animat is a term coined by Wilson [1991] signifying artificial animal; in DSL3S it is used more widely, representing all spatial elements that change or induce change in their surroundings; examples are: fire (in a wildfire model), urban areas (in a sprawling model) or predators (in a population dynamics model). Global variables provide information that is constant across the space of simulation, such as wind direction in a wildfire model or economic trends in an urban development model. Another important sort of context variables are those that support Animat internal state. An Animat is composed by a set of Attributes that describe each instance at a certain moment in time. The elements considered so far focus on the information required to run a spatial simulation, but more is required to capture spatial dynamics, the way animats act and react to the environment has to be made explicit. This character of simulation is termed Operation in DSL3S. The language proposes a set of just six predefined animat operations, intending to match the essential properties of an agent, as outlined by Franklin and Grasser [1997] – autonomous, continuous, reactive, proactive and mobile – with the core concepts found in cellular automata – state, neighbourhood, transition rules and time. In their seminal work, Epstein and Axtell [1996] conceive a considerably larger set of operations, including elaborate processes such as trade and cultural exchange. The option for a strict set of operations rests on three reasons: (i) to keep the language compact, easy to learn and memorise; (ii) more refined operations are less common in spatial simulation applications and can eventually be composed with these simpler primitives; and (iii) to insulate the user from technical implementation details in the choice between cellular automata and agent-based models.

40

The concrete animat operations proposed in DSL3S are: • Emerge: sets the conditions under which a new animat instance can appear in the simulation, i.e., the act of ”birth”; an example may be an urban development simulation where the emergence of new urban units is possible in an area that meets a certain set of criteria, like distance to transport infrastructure or topography. • Move: relates an animat with spatial variables or with other animats, determining the locations that are more or less favourable to be in. • Replicate: captures the conditions under which an animat replicates itself, such as an organism in a biological simulation reproducing a sibling. • Supply: provides access to the internal attributes of an animat, thus making resources or information available to other animats. It is the supply side of an interaction between animats. • Harvest: an operation that allows an animat to collect resources or information from other elements in its neighbourhood; it may concern other animats, targeting its attributes, or spatial variables. Between animats it is the demand side of an interaction, the counterpart of Supply. Examples may be wildfire consuming biomass or the seizure of resources from another animat as in a predator-prey simulation. • Perish: defines the circumstances under which an animat may cease to exist during simulation; examples can be an animal starving in a biologic simulation or a fire extinguishing. Figure 4.1 presents these key constructs in a conceptual model. A Simulation is composed by a set of Spatial and Global variables plus a set of Animats; the latter are composed by a set of Attributes that characterise their internal state. An animat acts through different types of Operations, that can induce changes on global and spatial variables, or be employed to interact with other animats. Different Animat configurations can be assigned to a Simulation, thus creating a different simulation scenario.

4.2

Concrete Syntax

The DSL3S UML profile gives body to the abstract syntax outlined above, with constructs defined as UML stereotypes. The stereotype Simulation is used to host definitions such as the spatial extent of simulation. It bonds together all the other elements, as an entry point to the simulation. The stereotype Global is intended to be a simple scalar value that can vary with time. It may, for instance, be set randomly at simulation start and/or made to evolve randomly at each time step. It can also be feed into the model as a predefined time-series of values, that may be an input from a text file.

41

Figure 4.1: DSL3S meta-model.

The Spatial stereotype is essentially a stub for the input of geo-referenced data. Each instance corresponds to a spatial layer, that can either be in raster or vector format. This stereotype also provides means for the fully random generation of spatial variables, that may be useful for prototyping with synthetic scenarios. The stereotype Animat is an aggregation of attributes existing at a perfectly identifiable location in space. The Attribute stereotype is a single characteristic of the Animat, representable by an atomic data type, such as an integer or a boolean (e.g. population in an urban development simulation). The initial number of animat instances of each type, and their spatial positioning, can be provided by a specified geo-referenced data set. The values of Attribute elements can also be initialised with the same spatial data set, through its attribute table. These initial Animat and Attribute settings can also be randomly generated for simulations where it may apply. The animat operations identified previously are also stereotypes in DSL3S; in detail: • Emerge: this stereotype defines neighbourhood thresholds relative to spatial variables, or relative to other animat attributes, above which the emergence of a new animat becomes possible. When a new animat is created, its initial state is set according to the parameters set in the Animat class itself. • Move: this stereotype provides properties to weight the relevance of each related element influencing the movement of an animat. For instance, in a predator-prey simulation the movement of a ”sheep” animat may be positively weighted in relation to a ”grass” Spatial layer and negatively weighted in relation to a ”wolf” animat.

42

• Replicate: this stereotype provides properties to set replication thresholds against the animat internal state. Impact on the reproducing animat and inheritance of attribute values to the new animat can also be modelled with specific properties. As with the Emerge operation, the initial state of a new animat resulting from a replication is set according to the properties of the Animat element itself. • Supply: together with Harvest this stereotype provides ways for animats to exchange assets. It makes the information or resource held in a particular Attribute available to other animat. A limit may be set on the amount of this asset that another animat may acquire in each interaction. • Harvest: this stereotype provides properties to parametrise how an interaction impacts the harvesting Animat. This is modelled with an harvest rate or harvest amount. A consumption rate of 100% may be used to model preying relationships, whereas 0% can be used to simply collect information on neighbouring animats and variables. • Perish: This stereotype defines an interval of values relative to an Attribute element, determining the conditions for the existence of the Animat itself. An exhaustive account of the properties present in each stereotype can be found in Apendix A. Other operation stereotypes can be added to the language in the future if necessary. DSL3S is a language conceived to remain open to further extension and the addition of further operation types is the most likely path of development.

4.3

Structural Semantics

The correct development of a simulation model with DSL3S must follow a set of rules regarding the valid associations between the different language constructs. Table 4.1 synthesises these rules, indicating which associations are valid and their respective cardinalities. A more thorough description of these rules follows. Each DSL3S model must contain exactly one Simulation construct. Each Animat, Global and Spatial elements composing the model must be associated to the Simulation . Spatial and Global variables represent passive constructs, but may appear associated with animat operation constructs, in such cases becoming sources of information and resources to Animat elements. As for Attribute constructs, they must always be associated to exactly one Animat (the owner). An Animat aggregates Attribute elements, defining its internal composition. Animats do not link directly to any of the information constructs, Spatial or Global, neither to other Animats. All relationships between an Animat and other elements in a simulation are made through its operation constructs. Reciprocally, each operation construct must be associated to its owner Animat element. A Move construct associates an Animat with other spatial objects. It can create a relationship with an Attribute of another Animat or with a Spatial variable. Each Move construct can only quantify

43

Spatial

Global

Animat

Attribute

Emerge

Move

Replicate

Supply

Harvest

Perish

Simulation Spatial Global Animat Attribute Emerge Move Replicate Supply Harvest Perish

Simulation

Table 4.1: Valid relationships in DSL3S with respective cardinalities.

1 1 1..N -

0..N 0..1 0..1 0..1 -

0..N 0..1 0..1 -

0..N 1 1 1 1 1 1 1

0..N 0..1 0..1 1 1 1 1

0..N 0..N 0..N 0..N -

0..N 0..N 0..N -

0..N 0..N -

0..N 0..N 1 -

0..N 0..N 0..N 0..N 1..N -

0..N 0..N -

propensity for movement regarding one other model element; therefore it must link to exactly one other construct apart from its owner. Emerge constructs are subject to rules similar to those applying to the Move operation. They must always link an Animat (its owner) with another construct in the model. Beyond Attribute and Spatial constructs, Emerge can also associate its owner Animat to a Global construct. The Supply construct must be always associated to an Attribute, to which it provides access. It can then be associated to multiple Harvest constructs that access the resource or information supplied. Harvest must also be always associated to an Attribute that stores the collected resource or information. On the other end it may associate to a single other construct among the types: Supply (in case the harvested target is an Animat), Global or Spatial. Replicate and Perish constructs are simpler, since each must be linked to a sole Attribute construct, creating the boundary conditions for the respective operation. They can not be associated with any other construct, and thus can only take part in a single association in the model.

4.4

Model Organisation - Views

Models developed with DSL3S can become visually complex if a single diagram is used to represent all elements, properties and associations. To avoid such cluttering issues, and provide a thorough structure for the development and presentation of models with the language, a multi-view approach is proposed. These views intend to display the model in such a way that each aspect of a simulation can be better presented in a specific diagram; these are the Simulation, Animat, Animat Interactions and Scenario views, shown in Figure 4.2. The Simulation View contains model settings and participating variables. It includes the single

44

Simulation construct plus the necessary Global and Spatial elements. The Animat View provides a container where to define the structure of an animat. In this view an Animat and its belonging Attribute constructs should be present, plus any associations to Spatial or Global elements. This includes any operations linked to these elements: Emerge, Move, Replicate or Perish. A view of this kind per animat is recommended, thus visually encapsulating its configuration. The Animat Interaction View is used to to describe operations between animats. It should contain all the Supply and Harvest constructs relating two (or more) animats, plus associated Attribute elements. Move operations relative to other animats may also be set in this view. Lastly is the Scenario View, where animats are assigned to a simulation. In this way, the designer may explore different animat configurations that can be used in different runs of a same simulation. This multi-view structure is recommended, but does not have to be necessarily followed when developing a model with DSL3S. The user is free to use alternative organisations that may be considered more appropriate in specific cases.

Figure 4.2: DSL3S model views.

4.5

Proposed Icons

A set of icons is also proposed together with the language to help rendering models developed with DSL3S visually more explicit (vide Figure 4.3). For Simulation, Global and Spatial elements are used direct pictorial representations of their concepts. For the stereotypes Animat, Attribute more abstract symbols are proposed, intending to create mental associations with a simulation model. The Animat icon is also a pictorial allusion to a small bug, that is then portrayed in different situations easily associated with each operation stereotype. These icons should provide for models that clear differ from the standard UML aspect. However, their usage was not fully supported in the MDD framework used to implement the prototype (described in Chapter 5); therefore, all DSL3S models presented hereafter appear in plain UML.

45

Simulation

Emerge

Move

Spatial

Global

Harvest

Animat

Supply

Replicate

Figure 4.3: The icons proposed for the DSL3S stereotypes.

46

Attribute

Perish

5 The DSL3S Framework

Contents 5.1 MDD3S - Prototype Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2 Resulting Source Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.3 Support Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

48 55 56

47

This chapter describes the prototype implementing DSL3S and the overall development framework proposed. Section 5.1 describes the tools used to develop the prototype; Section 5.2 presents the strucutre of the code generated by this prorotype; Section 5.3 introduces the accompanying materials that guide and aid model development.

5.1

MDD3S - Prototype Implementation

“Model Driven Development for Spatial Simulation Scenarios” (MDD3S) is the name of the prototype framework that supports the DSL3S language [de Sousa and Silva, 2012, de Sousa and Silva, 2015a, de Sousa and Silva, 2016]. MDD3S relies solely on open source tools (see Figure 5.1): (i) Papyrus - an Eclipse1 plug-in for UML modelling fully supporting UML profile development; (ii) Acceleo - another Eclipse plug-in supporting the development of model-to-code transformations; (iii) MASON - a Program-level spatial simulation tool-kit used as a library by the code produced by the transformation. (iv) GeoMASON - an extension to MASON providing spatial data specific programming interface (including data input/output functionality). This section reviews some relevant aspects of these technologies in the scope of the MDD3S framework.

Figure 5.1: The technologies used to implement MDD3S. 1 http://www.eclipse.org/modeling

48

5.1.1

Papyrus

´ Papyrus2 is an open source project started by the Commissariat a´ l’Energie Atomique in France, with the aim of producing an advanced graphical editor for the UML language. It is based on the Eclipse Modelling Framework3 (EMF), supporting the edition and visualisation of structured models stored using with the XMI standard. It is installable as a plug-in, enabling a dedicated Perspective for graphical model development, shown in Figure 5.2. Papyrus also provides a set of Java classes to facilitate model manipulation. Presently Papyrus is close to fully support version 2 of UML, bearing the development of ad hoc DSLs through the definition of UML profiles.

Figure 5.2: Main views of the Papyrus perspective in Eclipse: Model Explorer, Model Canvas and UML Palette.

The choice for Papyus was not without difficulty, in particular with earlier versions shipped with Eclipse Helios and Indigo releases. Albeit visible improvements in usability and functionality since then, relevant issues remained that prevented the fulfilment of certain aspects of DSL3S, such as the usage of icons and, most importantly, model validation. However, two important characteristics justified the persistence in its adoption. In first place its full integration with Eclipse; by providing access to the powerful plug-in mechanism of this platform, it greatly facilitated the public distribution of DSL3S and the integration with MDD3S. Secondly, by being a free and open source tool, Papyrus 2 http://www.eclipse.org/modeling/mdt/papyrus/ 3 http://www.eclipse.org/modeling/emf/

49

Table 5.1: The Java service hasLinkedStereotype used in MDD3S to determine if a model element is linked to elements of a specific type.

public boolean hasLinkedStereotype(Class c, String linkedStereotype) throws IOException { EList associations = c.getAssociations(); for (Association ass : associations) { EList elems = ass.getRelatedElements(); for (Element elm : elems) { List stereotypes = elm.getAppliedStereotypes(); for (Stereotype stereotype : stereotypes) if (stereotype.getName().equals(linkedStereotype)) return true; } } return false; }

made user evaluation (vide Section 6.2) considerably simpler. This was especially important since that later phase of the project was entirely reliant on voluntary participation.

5.1.2

Acceleo

Acceleo4 is an open source code generator created by the French company Obeo. It is also built on EMF, facilitating interoperability with several other EMF-based modelling tools. Acceleo interprets the Meta-Object Facility Model to Text Transformation language5 (MOFM2T), also an OMG standard. Though not yet fully implementing MOFM2T, the model-to-code transformations produced with Acceleo are today possibly the closest to the scheme proposed by the OMG. The model-to-code transformation mechanism is based on special files called templates, which define the text output to produce from graphical models. They are composed by regular text plus a series of annotations that are substituted by values and names of model elements at transformation time. Traditional computational operations such as branches or loops can also be used through specific annotations, producing more complex transformations. Templates can be articulated through an inclusion mechanism, whereby a master template can make use of several other templates, creating a transformation chain. When fully developed, a transformation chain can be packaged into an independent plug-in for Eclipse, facilitating its portability and application. Acceleo 3 fully supports model-to-code transformation from meta-models, identifying stereotypes applied on classes and providing access to its properties. The latter is not based on MOFM2T, but provided by a service, essentially a Java method that browses through the UML object model associated with each class, as exemplified in Table 5.1. When a transformation chain is applied on a model, all its elements are run through the several templates declared in the master. Typically, the template file filters each element, generating code only for those with a specific stereotype applied on. Such is the case with MDD3S, a template named Simulation, for example, generates the code for elements with the homonym stereotype applied. 4 http://www.acceleo.org/pages/introduction/en 5 http://www.omg.org/spec/MOFM2T/1.0/

50

Table 5.2: The MDD3S template for the Perish stereotype.

Acceleo transformation template [template public behavPerish(c : Class) ? (c.hasStereotype(’Animat’))] protected void perish(Sim sim) { [for (ass:Association | c.getAssociations())] [for (s:Element | ass.relatedElement) ] [let sClass: Class = s.oclAsType(Class)] [if (sClass.hasStereotype(’Attribute’))] [for (assP:Association | sClass.getAssociations())] [for (p:Element | assP.relatedElement) ] [let pClass : Class = p.oclAsType(Class)] [if (pClass.isNotNull())] [if (pClass.hasStereotype(’Perish’))] [if pClass.getTaggedValue(pClass, ’Perish’, ’upperThreshold’).isNotNull()] if(attribute[sClass.name/] >= upperThresh[pClass.name/]) sim.addTo[c.name/]Garbage(this); [/if] [if pClass.getTaggedValue(pClass, ’Perish’, ’lowerThreshold’).isNotNull()] if(attribute[sClass.name/] = upperThreshPreyPerish) sim.addToPreyGarbage(this); if(attributePreyEnergy New > Other, search for Java Project and click Next. 4.2 Type a name for project (e.g. “Simple Test”) and click Next. 4.3 In the Source tab click Create new source folder in the Details section. Call this new folder “src-gen” and click Finish in the new dialogue. 4.4 Add references to the Java libraries downloaded before (Libraries tab > Add External JARs...); finally click Finish.

Figure 1 – The Libraries tab in New Java Projecttextit wizard after adding external JARs. 5. 5.1 Keep (or switch to) the Papyrus perspective. If the Papyrus perspective button is not visible use the menus: Window > Open Perspective > Other... .

B-3

Figure 2 – The perspectives selector is found at the top right corner; this exercise is to be conducted in the Papyrus perspective.

6. Create a new folder in the project to contain the spatial data sets. 6.1 Right click the project in the Project Explorerview and select New> Folder; type “data” forFolder name and clickFinish. 6.2 Expand the datasets package downloaded in point 3 and and copy its contents into the new data folder. 7. Create a new Papyrus model (File menu > New > Papyrus Model) 7.1 Name it “SimpleTest.di” and make sure the “SimpleTest” project is selected; click Next>. 7.2 Select UML as language and Class Diagram for the first diagram; name it “Simulation”. 8. Click on the diagram (large blank space in the middle) and in the Properties view select the Profile tab. 8.1 Click the registered profile button (

) and then select DSL3S in the following menus.

Figure 3 – The Properties view is found below the diagram space. 9. Add a new class by drag-and-dropping the Class element from the Palette to the diagram and give it a suggestive name such as “MySimulation”. 9.1 With the new class element selected, access the Profile tab in the Properties view and click the stereotype application button (

). Select the Simulation stereotype in the following

menu, add it to the Applied Stereotypes box with the arrow buttons and click OK.

B-4

9.2 In the Profile tab expand the Simulation item to view its properties. 9.3 Click on the simulName property and in the text input box to the right type something like “Predator-Prey prototype”. 9.4 . Edit spaceWidth and spaceHeight, setting both to “100“. 10. Add a new class to the diagram, naming it “Pasture”. 10.1 Apply the Spatial stereotype on it; set the inputLayer property to “data/Pasture.agrid” and stepVariation to “0.5”. 10.2 Set colourMin to “224,224,128” and colourMax to “32,128,32”. 10.3 Link “Pasture” to the Simulation element with an association edge. 11. Add a new class to the diagram, and name it “Prey”. 11.1 Apply the Animat stereotype on it and set the inputLayer property to “data/Prey.shp” and wanderer to “true”. 11.2 Set colourMin to “64,64,255” and colourMax to “32,32,186”. 11.3 Associate “Prey” to the Simulation element.

Figure 4 – The Properties view for an element with an applied stereotype. By clicking on a property it is possible to change its value. 12. Open the Model Explorerview and right click on the root of the model tree. Select New Diagram> Create new UML Class Diagram and name it “Prey”. 12.1 Drag-and-drop the “Prey” element from the model tree into this new diagram. 12.2 Drag-and-drop the “Pasture” element in the same way.

B-5

Figure 5 – The Model Explorerview is found on the left, together or below the Project Explorer view. 13. Create a new class, name it “PreyEnergy”. 13.1 Apply the Attribute stereotype on it. Set the inputAttribute property to “Energy”, the stepVariation property to “-20”, maxValue to “50” and display to “true”. 13.2 Associate “PreyEnergy” to the “Prey” element. 14. Create a new class and name it “Graze”. 14.1 Apply on it the Harvest stereotype, and set the percentHarvested property to “100”. 14.2 Associate “Graze” with “PreyEnergy” and then with “Pasture”. 15. Save the model and generate code from it. 15.1 In the Project Explorer view expand the “SimpleTest” model item and right-click the uml element, select MDD3S> Generate simulation from DSL3S model. 15.2 Expand the src-gen folder and search for the GUI class (e.g. SimpleSimGUI.java). Right click and select Run As > Java Application. 15.3 In the simulation window click the play button and observe the prey (blue dots) wandering and grazing the pasture. 16. Add a new class to the “Prey” diagram and call it “PreyReplicate”. 16.1 1.1. Apply the Replicate stereotype and set lowerTreshold to “-1”, upperTreshold to “40”, toll to “20” and inheritance also to “20”. 16.2 Associate “PreyReplicate” to the “PreyEnergy” attribute.

B-6

17. Add another, class calling it “PreyPerish”. 17.1 Apply the Perish stereotype and set lowerTreshold to “0” and upperTreshold to “51”. 17.2 Associate “PreyPerish” to the “PreyEnergy” attribute. 18. Prey must seek the best pasture to survive. Create a class named “Prefer”. 18.1 Apply the Move stereotype to it, setting weight to “1” and scope to “1.5”. 18.2 Associate “Prefer” to “Prey” and then to “Pasture”. 19. Generate again the code and run it. Observe the prey replicating and rapidly grazing the pasture. 20. Switch to the “Simulation” diagram and add a new class, named “Predator”. 20.1 Apply the Animat stereotype, set wanderer to “true” and initNum to “25”. 20.2 Set colourMin to “255,64,64” and colourMax to “192,0,0”. 20.3 Associate “Predator” with the Simulation element. 21. Create a new diagram named “Predator” and drag-and-drop “Predator” and “PreyEnergy” into it. 22. In the new diagram add a new class called “PredEnergy”. 22.1 Apply the Attribute stereotype setting initValue to “50”, maxValue to “100”, stepVariation to “-1” and display to “true”. 22.2 Associate “PredEnergy” with “Predator”. 23. Add another class called “PredReplicate”. 23.1 Apply the Replicate stereotype and set lowerTreshold to “-1”, upperTreshold to “60”, toll to “30” and inheritance also to “30”. 23.2 Associate “PredReplicate” to the “PredEnergy” attribute. 24. Add yet another class called “PredPerish”. 24.1 Apply the Perish stereotype and and set lowerTreshold to “0” and upperTreshold to “101”. 24.2 Associate “PredPerish” to the “PredEnergy” attribute. 25. Instruct “Predator” to seek for prey, add another class and call it “Seek”. 25.1 Apply the Move stereotype, set weight to “1” and scope to “1.5”. 25.2 Associate “Seek” with “Predator” and then with “PreyEnergy”. 26.

Create a new diagram named “AnimatInteractions”; drag-and-drop the “PredEnergy” and “PreyEnergy” elements. B-7

27. In the new diagram add a new class named “FeedPredator”. 27.1 Apply the Supply stereotype and leave all properties by default. 27.2 Associate “FeedPredator” to “PreyEnergy”. 28. Add another class named “EatPrey”. 28.1 Apply the Harvest stereotype, set the percentHarvested property to “100” and scope to “0.5”. 28.2 Associate “EatPrey” to “PredEnergy”. 29. Associate “FeedPredator” with “EatPrey”; “Predator” can now feed itself. 30. Generate the code again and run it. Observe predator animats feeding off the excess of prey. 31. Switch to the “Simulation” diagram and add a class named “Inacessible”. 31.1 Apply the Spatial stereotype, set inputLayer to “data/Polygons.shp”, and initValue to “1” . 31.2 Set colourMin and colourMax both to “160,160,160”. 31.3 Associate “Inaccessible” with “Simulation”. 32. Switch to the “Prey” diagram and drag-and-drop the “Inaccessible” element. 33. Create a new class named “Avoid”. 33.1 Apply the Move stereotype, setting weight to “-1000” and scope to “1.5”. 33.2 Associate “Avoid” to “Prey” and then to “Inacessible”. 34. Generate the code once again and observe that prey animats now avoid the polygons representing inaccessible areas. 35. If the “Inacessible” areas are not visible they are probably hidden by the “Pasture” Spatial element. 35.1 Transparency can be set on “Pasture”, adding a fourth parameter to the colourMin and colourMax properties, e.g. to “224,224,128,128”.

Please, fill in the “DSL3S Questionnaire” available online: LQMClXU834 Further Info: • Wiki: https://github.com/MDDLingo/DSL3S/wiki

B-8

http://goo.gl/forms/

• The article: de Sousa, L. and da Silva, A. R., “Preliminary Design and Implementation of DSL3S”, CAMUSS - International Symposium on Cellular Automata Modelling for Urban and Spatial Systems, Oporto, 2012. http://isg.inesc-id.pt/alb/static/papers/2012/C115-ls-CAMUSS-2012.pdf

Thank you for participating!

B-9

C Evaluation Questionnaire

C-1

DSL3S – Domain Specific Language for Spatial Simulation Scenarios Simple Test Session – Questionnaire, v1.0, 2015 This survey intends to collect feedback on the Simple Test Session performed with DSL3S. Area of Expertise: • Computer Science • Geography / GIS • Environmental Sciences • Social Sciences • Economics • Other

Degree: • BSc • MSc • PhD

Age: • Less than 23 years • Between 23 and 30 years • More than 30 years

Gender: • Female • Male

Profession: • Academic / Faculty member • Researcher • Computer / Software Engineer C-2

• Geographic / Environment Engineer • Other

Previous experience with UML: • Yes • No

Previous experience with Eclipse: • Yes • No

Previous experience with Spatial Simulation: • Yes • No

Please answer the following questions using the Scale from 1 (very low) to 5 (very high), or N/A for not available, not relevant, or do not know. DSL3S Evaluation of the language (defined as a UML profile) Scale: 1 (Very Low) until 5 (Very High) How suitable is the number of concepts in the language? How easy to use is the notation chosen (UML Profile)? How easy is to learn the language? How suitable is the language for the Spatial Simulation development domain?

Development Framewok Evaluation of the framework (Eclipse and plug-ins) Scale: 1 (Very Low) until 5 (Very High) How do you rate the usability of Eclipse with the DSL3S and MDD3S plug-ins? How do you rate the usability of the Papyrus Model Editor (Stereotype application, graphical model browsing)?

1

2

3

4

5

1

2

3

4

5

N/A

C-3

How do you rate the usability of the code generation method? How do you rate the development process as a whole?

General Approach Evaluation of the MDD approach used in DSL3S Scale: in 1 (Very Low) until 5 (Very High) How much can DSL3S help domain experts lacking programming skills prototyping their own simulations? How much can DSL3S (or a similar approach) help in communicating spatial simulation models to stakeholders? Can DSL3S (or an MDD approach) be the basis for a standard language to describe spatial simulation models (considering in particular communication with peers)? Would you consider such a tool for development or prototyping of your own spatial simulation projects?

Additional Comments (Suggestions, Problems, Bugs):

Thank you for your contribution!

C-4

1

2

3

4

5

Domain Specific Language for Spatial Simulation ...

Domain Specific Language for Spatial Simulation ...

Suggest Documents

A Domain Specific Language for Spatial Simulation Scenarios(DSL3S ...

A Domain Specific Language for Spatial Simulation Scenarios(DSL3S ...

A Domain-Specific Language and Simulation Architecture for Motor ...

A Domain-Specific Language and Simulation Architecture for Motor ...

A Domain-Specific Language for Urban Simulation ... - Washington

A Domain-Specific Language for Urban Simulation Variables

Domain-Specific Language domain analysis and ...

Language-related domain-specific and domain ...

DOMAIN SPECIFIC LANGUAGE FOR THE ...

canopus: a domain-specific language for modeling

DOMAIN SPECIFIC LANGUAGE FOR THE ... - River Publishers

A domain specific transformation language for

A Domain Specific Language for Interactive Applications

Domain-Specific Language for Generating ...

DOMAIN SPECIFIC LANGUAGE FOR THE ...

Eugene A Domain Specific Language for Specifying

A Domain-Specific Modelling Language for Patient

A Domain Specific Transformation Language

A Domain-Specific Language for Aviation Domain ... - IEEE Xplore

Domain specific simulation modeling with SysML and

Domain specific simulation modeling with SysML and

Improving Domain-specific Language Reuse through Software ...

Variability Support in Domain-Specific Language Development

A Domain Specific Modeling Language Supporting ... - CiteSeerX