Paper Reference Number : 47 Collaborative

1 downloads 0 Views 217KB Size Report
initiatives for development of online environments that can suit researchers in areas ... underutilised crops incorporating a web-based geospatial collaborative ...
Paper Reference Number : 47 Collaborative Agricultural Research Environment Based on Open Source Geospatial Web Technologies Ebrahim Jahanshiri1, Sue Walker2 Crops For the Future The University of Nottingham Malaysia Campus Jalan Broga, 43500 Semenyih Selangor Darul Ehsan Malaysia 1 [email protected], 2 [email protected] Dr. Ebrahim Jahanshiri is research program coordinator for CopBASE at Crops for Future Research Center, Semenyih, Malayisa. He has experience in precision farming and the use of geographic information science (GIS) and remote sensing in agriculture. He has worked on many projects as research assistant, research fellow and research collaborator in Malaysia and Singapore prior to obtaining his PhD award in GIS and Geomatics Engineering from Unitersiti Putra Malaysia. He is a member of (FOAS), Pedometrics and OSGeo.

Professor Sue Walker is Research Theme Leader for Agrometeorology and Ecophysiology and the Programme Director for CropBASE & SystemPLUS at Crops For the Future, Semenyih, Malaysia. She has wide experience in agrometeorological across Southern and East Africa, including crop-climate modelling and Agromet services for farmers, focusing on semi-arid conditions. She is Professor Emeritus for Agricultural Meteorology at the University of the Free State, in Bloemfontein, South Africa.

Abstract The invention of Internet is a milestone in human history. Internet and web technologies are now increasingly used for sharing and disseminating knowledge. The scientific community also uses these technologies for collaboration. Despite the usefulness of web-based systems, initiatives for development of online environments that can suit researchers in areas such as data management and statistical analysis are scares. Given the importance of geospatial data, there have been initiatives for development of data sharing standards over the past decade. Many agricultural research datasets fall into the realms of space and/or time, and therefore researchers can use these geospatial standards and web-based sharing mechanisms for exchange of information. CropBASE is Crops For the Future’s platform focusing on underutilised crops incorporating a web-based geospatial collaborative research environment (CRE). The purpose of CRE is to increase collaboration and decrease duplication and inconsistencies in research. The tools that are provided within CRE system are meant to

advance research on the underutilised crops around the world. The core part of CRE system is data exchange among the research groups. It uses available open source web technology and geospatial standards for agricultural research data. Further, CropBASE platform aims to include online tools that can assist researchers with data visualization and statistical analysis within the geospatial environment. The web based CRE also incorporates other modules like crop modelling web interfaces that are currently under development in other projects around the world to promote research on underutilised crops. This paper provides an update on the development of this CRE, and CropBASE would welcome interaction and opportunities for future cooperative projects. Keywords: Collaborative research network (CRE), geospatial, data standards, CropBASE Research data A research cycle starts with identifying the research area or knowledge gap, designing the experiments and carrying out the research. The next stages namely, analysing research results and publication of new findings are the most crucial process in each study. During these stages, numerous data and documents are created and shared between the scientists. Organizing research data and documents adds another layer of complexity to the research operation. Development of Internet has greatly facilitated the post experiment communications between research parties around the world. Email, FTP, telnet and later www have shown great potential for communication of research data and documents. As a result sending and receiving text, data, video etc. at a greater speed is now possible given the improvements that are under way in majority of territories around the world to improve the speed, stability and reliability of the Internet. Despite advances in data transfer technology, sharing data between researchers and organizations that are involved with any type of data is still a challenge. This leads to duplications and under-utilisation of data that might have otherwise been used for new research. It will also affect the reliability of research, as replication of data analysis is not possible or required [1]–[3]. This will delay the process of converting data to knowledge. Decisions about sharing data and its mechanisms both in data format and method of transfer therefore, usually rest on researchers that might not have adequate training and knowledge in this area. Data Standards Along with the development of Internet, standards for data collection, document writing and communication have also evolved. As a result, a variety of data collection standards are now available that can be used to facilitate the experimental design in research. One major improvement in data sharing is the development of metadata and later metadata standards. Metadata as ‘data about data’ improves the readability and application of data and facilitates the discovery of new questions. The Geographic information systems (GIS) community that deal with large datasets have long adopted international data standards in all aspects of geographic data creation, collection

and sharing. As a result, there are now means available both inside the territory of Internet (web and FTP based) and outside of it to share the data and metadata. Many of the metadata standards that were developed for geographic data can also be used for data sharing in other disciplines and vice versa. Currently, many data services are available on the Internet. However, most of these systems are developed using different methodologies and technology. Agricultural Data and Standards Agricultural research comprises a vast array of data types. Many of these data can be analysed using spatial and time-based analysis methods and therefore sharing policies for GIS data can be applied. In contrast, other data consisting of laboratory, genomic and trial data might have different characteristics. There are a variety of data formats and standards that the researchers in agriculture specifically need to work within. Therefore, efforts to standardize the agricultural vocabulary that can be used in both storage of data and describing the experimental results have started. Food and Agriculture Organization (FAO) of the United Nations has been actively involved with developing such standardized vocabularies through the development of AGROVOC, aims.fao.org/agrovoc. For describing experiments, International Benchmark Sites Network for Agrotechnology Transfer (IBSNAT) created a first set of standards. Later these standards were revised by the International Consortium for Agricultural Systems Applications (ICASA) [2]. Both of these standards are currently implemented in a relational database format and can be used in variety of applications to facilitate data storage and sharing. Collaborative Research Environment (CRE) CRE is a facility that aims to help researchers in many areas during the design of experiments through to data analysis and reporting [4] [5]. It could encompass a physical space that is used for research activities that are mostly data and experiment related [6]. However with abundance of Internet services, it is increasingly used to describe an online environment, including toolsets and other facilities that aim to make researchers life easier and more consistent at least in the areas of data and document creation, sharing and analysis. Currently there is no standard for naming such an environment and different names are normally used for referring to such settings. BioCoRE is a collaborative work environment for biological research developed by University of Illinois at Urbana-Champaign [7]. The Planets Testbed is a CRE for Digital Preservation [8]. Oxford University has created a website to support researchers in managing and sharing data [9]. Scholarlink is a CRE based on reading lists and tagging mechanisms [10]. DiCore is a Distributed, Integrated, and Collaborative Research Environment to facilitate the scientific discovery and collaboration in all areas [11]. CARE-G is a chemical collaboration research environment groupware to establish the group collaborative system for chemists in chemical collaboration research environment [12]. Finally, GeoCRE is a collaborative research environment for geographic data collection and sharing developed at the Department of Physical Geography, University of Freiburg [13].

The majority of the above examples focus on specific areas like data and document sharing and management. The aims of a CRE can grow further to include other related research and data matters. There is therefore, an opportunity to build a CRE that goes beyond a data and document management system and extends towards creating tools for research design, data analysis, report writing, and so on. CropBASE CRE Design CropBASE is the Crops For the Future’s platform for underutilised crops research management and knowledge dissemination. This platform is currently under design and development by a research group consisting of scientists and programmers. The core part of the CropBASE CRE is research data management. It will accommodate all the necessary tools and technologies that are related to efficient research design, administration, analysis and dissemination of data and research findings. Later, it will be accompanied by CropBASE KNOWLEDGE that aims to provide tools that organize the knowledge on underutilised crops. Metadata Sharing and Document Management System Currently two types of research data management systems are being examined for the implementation within CropBASE CRE. The first is the Dataflow by University of Oxford [14], which consists of two components, DataStage and DataBank. DataStage is a file management system that creates an easy to use file sharing mechanism for institutional use. It goes beyond a simple file sharing protocol to share metadata, web interface for access outside an organization etc. DataBank provides a scalable repository system for storing the data within any organizational environment. It is provided as free and open source under MIT licence [15]. The code is written in python (Linux-based) and its metadata is based on Resource Description Framework (RDF). DataBank uses Dublin Core [16] as its default metadata standard. The system is able to assign a digital object identifier (DOI) using the DataCite application programming interfaces (APIs) for easy citation of datasets in literature. The project has been under active development until end of 2013 [17]. Given the advancements in GIS data, the other option for implementation of data repository for CropBASE CRE is GeoNetwork opensource [18]. It is a metadata cataloguing system for the geographic data based on the principles of Free and Open Source Software (FOSS). It comprises the international and open Standards for services and protocols (ISO/TC211 and OGC, Dublin Core etc.). The first versions were incepted by FAO and other organizations. Throughout time, GeoNetwork opensoucre became a foundation for developing web-based cataloguing systems for spatially referenced data. Currently it can be installed as a standalone web platform on any type of server. The code is based on JAVA (www.java.com) and small scale server can be implemented on Jetty, www.eclipse.org servlet. Both of the above systems are currently being tested for implementation within the CropBASE platform. However given the active development that is being done on the GeoNetwork opensource, the propensity towards adoption of this system has grown. In order to efficiently describe the agricultural data, a specific metadata should be developed based on

the available standard vocabulary and experimental design to be implemented into the system. A part of the design will also be devoted to connect CRE database to the currently available soil, climate and crop genotypic data available around the world. Online Tools for Data Analysis and Crop Simulation Usability of online research databases can be increased by providing online data analysis tools that are designed for specific needs. The current web technology allows for the creation of such online tools for analysis of data. Open source statistical programming languages like R, www.r-project.org, can be used to provide specific tools for agricultural data analysis and mapping [19]. This will extend the reproducible research motto “code and data or it didn’t happen” [20] that is now being practiced in disciplines like statistics [21] and through into agricultural research. Currently, AgMIP (Agricultural Model Inter-comparison and Improvement Project) has created a variety of tools for crop simulation and modelling [22]. The group has been involved with the development of a range of tools for the data preparation, translation and analysis. The libraries and (API’s) that are developed for crop simulation are open source and available for the implementation. These tools will be included in the online data analysis toolset to facilitate crop modelling. Conclusion The open source initiative, both in geospatial and agricultural communities provide exciting opportunities to develop online data management, sharing and analysis services. Geospatial open source initiatives have resulted in the development of GeoNetwork opensource. It is a data cataloguing system and given the spatial and temporal nature of many agricultural data, it can be used for online agricultural research data management. For other types of agricultural research data, other metadata cataloguing systems like DataFlow are also available for online research data sharing and management. Open source software for statistical analysis can also be used as an added value to the CRE system. For example it is possible to provide tailored mapping and statistical analysis tools for online data visualization and analysis. The CropBASE platform at CFF aims towards the development of a feature rich geospatial CRE and the necessary accompanying tools for the ease of access of researchers working on the underutilised crops. References [1] C. Vogeli, R. Yucel, E. Bendavid, L. M. Jones, M. S. Anderson, K. S. Louis, and E. G. Campbell, “Data withholding and the next generation of scientists: results of a national survey,” Acad. Med., vol. 81, no. 2, pp. 128–136, 2006. [2] J. W. White, L. A. Hunt, K. J. Boote, J. W. Jones, J. Koo, S. Kim, C. H. Porter, P. W. Wilkens, and G. Hoogenboom, “Integrated description of agricultural field experiments and production: The ICASA Version 2.0 data standards,” Comput. Electron. Agric., vol. 96, pp. 1–12, Aug. 2013. [3] S. Fomel and J. F. Claerbout, “Reproducible research,” Comput. Sci. Eng., vol. 11, no. 1, pp. 5–7, 2009.

[4] “Collaborative research environment,” 20-Feb-2013. [Online]. Available: http://www.newcastle.edu.au/research-and-innovation/resources/technology-researchcatalogue/resource-catalogue/collaborative-research-environment. [Accessed: 27-Oct2014]. [5] V. Hinze-Hoare, “Designing a Collaborative Research Environment for Students and their Supervisors (CRESS),” ArXiv Prepr. ArXiv07081624, 2007. [6] R. N. Goldstein, “Architectural Design and the Collaborative Research Environment,” Cell, vol. 127, no. 2, pp. 243–246, Oct. 2006. [7] “BioCoRE - A Biological Collaborative Research Environment.” [Online]. Available: http://www.ks.uiuc.edu/Research/biocore/. [Accessed: 27-Oct-2014]. [8] B. Aitken and A. Lindley, “The Planets Testbed: A Collaborative Research Environment for Digital Preservation.” [Online]. Available: http://ercimnews.ercim.eu/en80/special/the-planets-testbed-a-collaborative-research-environmentfor-digital-preservation. [Accessed: 27-Oct-2014]. [9] “Research Data Management - University of Oxford,” Research Data Management. [Online]. Available: http://researchdata.ox.ac.uk/. [Accessed: 27-Oct-2014]. [10] G. Kazai, P. Manghi, K. Iatropoulou, T. Haughton, M. Mikulicic, A. Lempesis, N. Milic-Frayling, and N. Manola, “Architecture for a Collaborative Research Environment Based on Reading List Sharing,” in Research and Advanced Technology for Digital Libraries, M. Lalmas, J. Jose, A. Rauber, F. Sebastiani, and I. Frommholz, Eds. Springer Berlin Heidelberg, 2010, pp. 294–306. [11] H. Su, H. Zhang, and R. Hashemi, “Distributed, Integrated, and Collaborative Research Environment (DiCore): An Architectural Design,” in Eighth ACIS International Conference on Software Engineering, Artificial Intelligence, Networking, and Parallel/Distributed Computing, 2007. SNPD 2007, 2007, vol. 1, pp. 21–26. [12] C. Zhao, R. Zhang, D. Yue, R. Wei, Y. Cheng, and L. Li, “Research and Implementation of Groupware in Chemical Collaborative Research Environment,” in ChinaGrid Annual Conference, 2008. ChinaGrid ’08. The Third, 2008, pp. 253–258. [13] M. Hoschek, “GeoCRE.” [Online]. Available: http://geocre.net/about. [Accessed: 27Oct-2014]. [14] “DataFlow project.” [Online]. Available: http://www.dataflow.ox.ac.uk/. [Accessed: 27Oct-2014]. [15] “The MIT License (MIT) | Open Source Initiative.” [Online]. Available: http://opensource.org/licenses/MIT. [Accessed: 27-Oct-2014]. [16] “DCMI Home: Dublin Core® Metadata Initiative (DCMI).” [Online]. Available: http://dublincore.org/. [Accessed: 27-Oct-2014]. [17] “DataFlow Project - GitHub repository,” GitHub. [Online]. Available: https://github.com/dataflow. [Accessed: 27-Oct-2014]. [18] R. Ožana and B. Horáková, “Actual State in developing GeoNetwork opensource and metadata network standardization,” GIS Ostrava 2008, 2008. [19] E. Jahanshiri and A. R. M. Shariff, “Developing web-based data analysis tools for precision farming using R and Shiny,” in IOP Conference Series: Earth and Environmental Science, 2014, vol. 20, p. 012014. [20] T. Wallis, “Code and data, or it didn’t happen,” Practical Vision Science. . [21] R. Gentleman and D. T. Lang, “Statistical analyses and reproducible research,” J. Comput. Graph. Stat., vol. 16, no. 1, 2007. [22] “IT Team | AgMIP.” [Online]. Available: http://www.agmip.org/it-team/. [Accessed: 27-Oct-2014].