Software Citation, Reuse and Metadata Considerations: An Exploratory Study Examining LAMMPS Kai Li College of Computing and Informatics, Drexel University 3141 Chestnut Street, Philadelphia, PA 19104
[email protected]
Jane Greenberg College of Computing and Informatics, Drexel University 3141 Chestnut Street, Philadelphia, PA 19104
[email protected]
Xia Lin College of Computing and Informatics, Drexel University 3141 Chestnut Street, Philadelphia, PA 19104
[email protected] ABSTRACT
Scientific software is as important to scientific studies as raw data. Yet, attention to this genre of research data is limited in studies on data reuse, citation, and metadata standards. This paper presents results from an exploratory study that examined how scientific software’s reuse information is presented in the current citation practice and natural language descriptions in research papers. We selected LAMMPS, popular simulation software used in material science, for this study. Both descriptive metadata elements and the types of reuse are examined from a sample of 400 research papers. The results indicate that both descriptive metadata elements and reuse types about LAMMPS are presented in incomplete and inconsistent ways, and this interferes with the values of scientific software, as a type of research data. Our findings necessitate future studies on the metadata standards to facilitate the identification of information related with scientific software reuse. Keywords
Research data, scientific software, research data reuse, research data citation, metadata standard INTRODUCTION
During the past decade, advancing means for reusing research data has become a strong focus in scholarly communication studies. Two major factors contributed to {This is the space reserved for copyright notices.] ASIST 2016, October 14-18, 2016, Copenhagen, Denmark.
[Author Retains Copyright. Insert personal or institutional copyright notice here.]
this research trend. First, the “data deluge” has been posing great influences on the scientific practice across nearly every knowledge domain (Hey, Tansley, & Tolle, 2009). Second, the increasing needs for reproducible research in scientific studies and academic discourse (Borgman, 2012; Rung & Brazma, 2013). There has been a good deal of progress in this area, with the development of metadata and citation standards for research data, for example, the adoption of Digital Object Identifier for research data (Paskin, 2005), DataCite’s metadata schema for research data (DataCite International Data Citation Metadata Working Group, 2015; Starr & Gastl, 2011), and Force 11 (Martone, 2014). These developments are important for facilitating data discovery and enabling reuse; although they do not fully document why and how the research data is reused. The absence of this specific information is even more apparent with scientific software, which has traditionally not been a source of citation, although views on this are rapidly changing. That is, software is seen as being just as important as the raw data for the outcomes of scientific studies, and researchers claim that it should be included in the research data complex (Ahalt et al., 2015; Herterich & Dallmeier-Tiessen, 2016; Wynholds et al., 2012). As part of this change, researchers also report that the position of software and other parts of the research process, such as code and technical documentation, in the overall landscape of research data, are still being debated (Kratz & Strasser, 2014; Parsons & Fox, 2013). Furthermore, given the unique characteristics of software, the standards for research data ‘may’ or ‘may not’ be applicable for software in the first place, not to mention to study how it is reused. Given these circumstances, and the goal to support data reuse and reproducible research, it seems more important than ever to study software citation and reuse. The research reported on in this paper considers this need, and
researchers at Drexel University’s Metadata Research Center in connection with the Center for Visualization and Decision Informatics (CVDI), working with materials scientist have been studying this topic. A chief focus is the simulation software LAMMPS, which is heavily cited and reused in scientific studies. LAMMPS was selected because it is one of the most used software is molecular simulation studies. The research reported in this paper was undertaken to establish baseline in this area and to consider metadata requirements for scientific software reuse.
To help contextualize the research presented in this immediate paper, we turn to Huang et al. (2013) who defined scientific software as “all kinds of software that are used to support scientific research.” This definition supports the research on LAMMPS presented in this paper. Data and Software Citation
Data citation is an important topic in research studies addressing data reuse. Some of the most important topics include sharing research data can result in increased citing rate (Piwowar & Chapman, 2010; Piwowar, Day, & Fridsma, 2007), a lack of proper citation policies and infrastructure is preventing research data from being shared and reused (Mooney & Newton, 2012).
The following paper presents our work. The next section reviews literature on data reuse, data citation, and metadata elements about research data and scientific software. Next, the research questions are presented, followed by a review of our methods, the sample of data, and procedures. The key findings are reported, followed by a discussion, and the paper wraps up with a conclusion. The conclusion highlights how this study can contribute to our knowledge about the applicability of metadata for scientific software, and how this work is informing future research plans.
To address this problem, a series of policies and technical standards have been proposed. Starr et al. (2015) summarized four components of a research data infrastructure from the perspectives of implementation and system, namely, document data model, publishing workflows, common repository application program interfaces (APIs), and identifiers, metadata and machine accessibility.
LITERATURE REVIEW
However, researchers have identified a few problems in the current software citation practice. For example, Mooney (2011) identified that only 29 percent of the papers had a complete data citation in the reference list. Moreover, it is obvious that, without proper citation, there is no way that information about software reuse can be identified by whatever means. Ince, Hatton & Graham-Cumming (2012) discussed ambiguities in software citation created by the use of natural language description and a lack of context of deployment. As a result, they suggested that journals should establish policies concerning the degree of source code accessibility that is connected with the research paper.
Reuse of Scientific Software as Research Data
A growing body of studies has been done on the reuse of scientific data because of its importance (Borgman, 2012). For example, qualitative studies have been pursued across multiple fields, including archaeology (Faniel et al., 2013), astronomy (Wynholds et al., 2012), earthquake engineering (Faniel & Jacobsen, 2010), and ecology (Zimmerman, 2007; Zimmerman, 2008), just to name a few, to explore researchers’ intentions and behaviors of reusing research data, and the underlying factors. Moreover, increasingly more quantitative studies have also been conducted on topics like what factors encourage researchers to reuse research data (Curty, 2015) and influence the satisfaction of reusing social science data (Faniel, Kriesberg & Yakel, 2015).
Metadata Standards for Software Reuse
Metadata is central to the data citation framework and identifying and tracking data reuse. Metadata for research data has been the topic of extensive scholarly and practice efforts, though few of them took research data reuse and scientific software into consideration.
A growing idea that is attracting more attention is that research data should include secondary research data objects, like scientific software, code, and technical documentation (Herterich & Dallmeier-Tiessen, 2016; Wynholds et al., 2012), rather than just raw data (e.g., Carlson & Anderson, 2007; Zimmerman, 2007). Even though, there are still many challenges in seeing scientific software as a type of research data. (Parsons & Fox, 2013)
In practice, the Dublin Core Metadata Element Set has been commonly used for research data, where the common practice is to develop an application profile for a given context, the most notably of which is the Dryad Application Profile (Ball, 2009; Diamantopoulos et al., 2011). However, the perceived flat structure of Dublin Core model (Lagoze, 2000) presents challenges for some, in expressing the complex relationships among research datasets and between research datasets and other types of research products, some of which were discussed in Kratz & Strasser's literature review (2014), for example, deep and dynamic citation.
Much of the literature on the reuse of scientific software focuses on user needs, licensing, review of code, and user awareness (Goble, 2013; Joppa et al., 2013; Morin et al., 2012). This concept is also covered by studies on software sustainability, most of which examine reuse from the developer’s perspective, i.e., how to take reusability into consideration when the software is being developed (e.g., Crouch et al., 2013; Stewart, Almes & Wheeler, 2010; Wilson et al., 2012).
These challenges are part of the motivation for the development of DataCite Metadata Schema. Besides the
2
descriptive metadata elements, the standard is also able to describe the relationship between two datasets, for example, citing, supplements, and continuation, et al. (DataCite International Data Citation Metadata Working Group, 2015; Starr & Gastl, 2011) As exciting as this step is, its applicability to scientific software and research data reuse seems to not have been a key focus.
RESEARCH METHODS
In order to investigate the above research questions, we conducted a two-phased content analysis, examining a sample of research papers that cited LAMMPS software. The first phase allowed us to examine the reuse types of LAMMPS as presented by the research papers, and the second phase targeted what metadata elements are embedded in the natural language descriptions in the papers. A content analysis method was selected because it allowed us to examine the texts surrounding the citation for LAMMPS software and the contexts of why and how the software was cited. LAMMPS, short for large-scale atomic/molecular massively simulator, is a molecular dynamics program created by agreement between Sandia National Laboratories, Lawrence Livermore National Laboratory and three other companies. This project began in the mid 1990s, whose coding works were led by Steve Plimpton at Sandia Laboratories. (“LAMMPS History,” n.d.) We selected this software because it is so popular that a rich body of research papers has cited it, which gives us the chance to have a broad perspective on software reuse. According to its official instruction of citation (“Citing LAMMPS in your papers,” n.d.), in order to cite LAMMPS, one needs to cite the paper titled “Fast Parallel Algorithms for Short-Range Molecular Dynamics” written by Plimpton in 1995 (Plimpton, 1995). Below are the three main steps in our study.
Although group efforts addressing software citation are more recent, it is important to note that it has, in fact, been a topic of interest at times, and there has been definite interest in the application of metadata standards. One key examples is the work of García et al. (2006), where researchers developed a software measurement ontology, which covered software citation from the perspectives of software development. More recently, Hong (2014) developed a multi-level metadata framework to describe the reusability of scientific software for developers. The most relevant effort to our study is the project called “Code as a research object” organized by Mozilla, Figshare and GitHub, where researchers can upload their codes to GitHub and get persistent identifiers. This project has the goals to promote the reuse of academic code as a type of research data, and to integrate code and scientific software into the scholarly workflow. (Thaney, 2013) These and related developments mark interests in this topic from both academia and industry. The work that is presented in this paper is also along the same path, focusing on how to identify the information about research papers reusing the scientific software.
Data Preparation
To prepare for our data, we first identified all the papers that cited the Plimpton’s 1995 paper on Google Scholar. We searched the title of Plimpton’s paper to have a list of the papers, and selected the top 400 papers as our sample based on the number of citation they received. Full text PDF files of all these papers were downloaded manually.
RESEARCH QUESTIONS
The goal of this paper is to examine how relationships about scientific software reuse are presented in the current practice of scientific software documentation and citation, and to consider how this may impact future metadata work. Based on the discussions above, if a metadata scheme is to be developed in the future, it should not only describe the dataset's intellectual and physical attributes, but also its relationships with other research products, most important of which is the reuse relationship. We selected to study LAMMPS because it is known to be a popular molecular simulation software. Additionally, there is little empirical reporting on the resue. For these reasons, we conducted an exploratory study to answer the following two questions: •
How is LAMMPS reused in research papers?
•
What descriptive metadata elements about the reuse of LAMMPS can be identified in the natural language descriptions in the papers?
Classification
In order to determine the different ways in which scientific software are reused, a classification scheme was developed using a qualitative and inductive content analysis method. In other words, we did not have any pre-determined scheme before extracting the categories. The first 200 papers were used to extract the categories by the first coder. An eightcategory scheme was developed, which was later integrated into a four-category scheme after the discussions between the project members, including a faculty member in the Material Science program at Drexel University. Our final scheme is summarized in Table 1 below.
Zimmerman’s (2008) definition of data reuse helped us to frame our work. For our research, reusing scientific software is defined as the use of software to study a new problem. However, distinguished from scientific data, LAMMPS, like many software packages, are designed to be reused. As a result, many cases of the reuse of software can be called use of software in more common contexts.
3
Category
Definition
Reuse, unspecified
The paper reuses LAMMPS as whole in the main study or does not specify which other types of reuse it is.
Modified reuse
The paper uses a modified version of LAMMPS in the main study. The specification of modification may or may not be specified in the paper.
Benchmark
The paper only uses LAMMPS (original
modified version of LAMMPS was used. It is reasonable to doubt that at least some of the papers in this category actually belong to the other two reuse-related categories, but just that the information was not specifically addressed in the paper, especially because of the fact that a lot of the citing sentences belong to this category are very short and uninformative. Below are two examples of this category:
or modified version) in the background study.
Cite
The paper does not use LAMMPS per se, but just cites either the software or Plimpton’s paper, including those papers that just use the method represented in the original paper.
Table 1. Final scheme of the reuse types of LAMMPS
This classification scheme was triangulated using the latter 200 papers by the same coder. Moreover, the categorization of all the papers and the scheme itself were further tested by the second coder. Even though some papers were classified differently, no revision to the model was found to be necessary. Discussions were held to solve the differences in the classification. Metadata Element Extraction
•
"LAMMPS was used for all MD simulations. [37]" (McMahon, Cheung & Troisi, 2011)
•
"LAMMPS [28,29] (Large-Scale Atomic/Molecular Massively Parallel Simulator), developed at Sandia National Laboratories, was used to model [0001] oriented ZnO NWs with diameters ranging from 5 to 20 nm." (Agrawal, Peng & Espinosa, 2009)
Modified Reuse
All the sentences that are about LAMMPS or Plimpton’s paper were examined to extract attributes of how LAMMPS was reused in these papers using content analysis method. Our intention was not to collect a full list of metadata elements; rather, we focused on three metadata elements that are most common and useful for describing data reuse to our knowledge in these papers, namely, version, parallel codes, and models of calculation.
This category includes those papers that used a modified version of the software LAMMPS in the main study. Below are two examples of this category:
RESULTS
The sections below present the results of this study. First we present the four categories of the classification scheme, including more detailed explanations and examples of each category. In the second part, we identify metadata elements that were found in the natural language descriptions. LAMMPS’ Reuse Types
•
"The annealing simulations were performed with LAMMPS (large-scale atomic/molecular massively parallel simulator) code from Plimpton at Sandia (modified to handle our force fields). " (Jang et al., 2004)
•
"We would like to thank E. Charlaix and P.-F. Gobin for introducing us to this subject, and Dr. S.J. Plimpton for making publicly available a parallel MD code, [25] a modified version of which was used in the present simulations." (Barrat & Bocquet, 1999)
Benchmark
Four categories of the reuse of LAMMPS were discovered in our study, namely, unspecified reuse, modified reuse, benchmark, and cite. The numbers of papers belonging to each category are summarized in Figure 1 below.
This category includes all the papers that used the software LAMMPS in the background study and/or to compare with the results in the main study. However, it doesn't include comparative studies, which used LAMMPS and some other software to do the same simulation in the main study. Two examples are as below:
Figure 1. Occurrences of the four categories in the reuse scheme Reuse, unspecified
This category includes all the papers that used the software LAMMPS in their main studies, but didn't specify if a 4
•
"To demonstrate what one should expect of a precise MD trajectory, the same simulation run is performed using LAMMPS again, but this time on four processor cores in parallel. ... We have compared our GPU implementation against LAMMPS running on a fast parallel cluster, see Fig. 8, and we have shown that the GPU performs at the same level as up to 36 processor cores." (Anderson, Lorenz & Travesset, 2008)
•
"In order to compare our GPU version to a welloptimized sequential code, we have also compared our CUDA implementation to LAMMPS." (Liu et al., 2008)
from the last example. However, in the lights of controlled vocabulary, all the examples that are not full version name are less than ideal practices, and are not so useful for both human and machine users of the texts.
Cite
This category involves those papers that did not use the software LAMMPS in the study, but just cited the original paper written by Plimpton in 1995, the methods discussed in Plimpton's paper, and/or the software LAMMPS. This is a non-reuse category, and most of the papers in this category have the citation of the paper in the literature review section, if the papers are not literature review themselves. Below are two examples of this category:
Codes that are part of LAMMPS package
During the history LAMMPS' development, some parallel codes that were developed by Sandia have been gradually integrated into the package, including ParaDyn, Warp, and GranFlow. ("LAMMPS History", n.d.) These codes represent both parts of and specific versions of LAMMPS, and are also significant information about the reuse of LAMMPS.
"Research by Plimpton [38], Plimpton and Hendrickson [37], and Hwang et al. [19] shows that this method provides a better speedup than RD, and can be used with good speedups up to hundreds of processors. " (Kale et al., 1999)
•
WARP is the most commonly mentioned code in our sample, being used by 13 papers in our sample. ParaDyn was mentioned to be used in 5 papers. And GranFlow failed to appear in any papers. It is shown by the table below that all these papers were written by a small group of researchers and published between 2005 and 2008. Yet to be examined by further studies, this phenomenon might be because of that scholars tend to reuse the same piece of software for the sake of efficiency and consistency. All the papers are summarized in Table 3 by the group of authors.
"Software such as LAMMPS [212], IMD [213] and DL_POLY [214] are publicly available to perform large-scale MD simulations on parallel platforms." (Mishin, Asta & Li, 2010)
•
Metadata Elements about LAMMPS’ Reuse in the Natural Language Descriptions The following analysis was pursued to answer the second question, to identify the metadata elements related with the reuse of LAMMPS that could be gleaned from the naturallanguage descriptions in the papers. As mentioned in the Research method section, we focused our analysis on the following three descriptive metadata elements, version, parallel codes of LAMMPS, and models of calculation, because they are the most commonly discussed elements in our sample.
Name of the code
Ji & Park, 2006; Park, 2006; Park, Gall & Zimmerman, 2005, 2006; Park & Zimmerman, 2005, 2006 WARP
On its website, all the versions of LAMMPS were listed from ParaDyn (a parallel code which will be discussed later), LAMMPS 99, 2001 to the most current version ("LAMMPS History", n.d.). Version is an important attribute of software, given the potentially significant differences between versions, and their influences on the results of the experiments. However, among all the 305 papers, only five instances of the version information were identified. Version
Voronov, Papavassiliou & Lee, 2006
17 Jan 2005
Karayiannis & Mavrantzas, 2005
2005
Sachs, Crozier & Woolf, 2004
2003
Zhang & Greenfield, 2007
2001
Daub et al., 2010
"Current version"
Liang & Zhou, 2006 Tschopp & McDowell, 2008a, 2008b; Tschopp, Spearot & McDowell, 2007; Tschopp, Tucker & McDowell, 2007, 2008
Version
Citation
Citation
ParaDyn
Cao & Ma, 2008; Cao & Wei, 2006, 2007a, 2007b; Cao, Wei & Ma, 2008
Table 3. Papers citing parallel codes of LAMMPS organized by group of authors Models of calculation
In many papers we collected, the models that were used to in the calculation or simulation were also discussed. Because of the user-contributed nature of LAMMPS, a large number of functions have been added to this software package since it was developed. As a result, the specific models as well as the corresponding parts of LAMMPS used in the studies are also an important metadata element to describe the reuse of LAMMPS. Because of the difficulties to determine if the model that was mentioned in the paper is a specific model or just a general name, our efforts were not intended to record all the models that were mentioned in the papers. Rather, we kept track of the commonly mentioned, and clearly articulated ones. Table 4 is the summary of the 5 top models in our sample and their occurrences.
Table 2. Version information identified in the sample
"17 Jan 2005" and "2001" are the only two full version names based on LAMMPS' website. For the rest of the examples, it can only be assumed that 2003 and 2005 might be the years when the researchers downloaded the software, not to mention the very little information that is implied
Model
5
Occurrence
Adaptive Intermolecular Reactive Empirical Bond Order (AIREBO) Potential
15
Embedded atom method (EAM) potential
15
Nose-Hoover thermostat
10
Reactive force field (ReaxFF)
7
Velocit-verlet algorithm
6
by our study that the inclusion of these metadata elements in the papers is rare and highly inconsistent. LAMMPS’ reuse relationship
As addressed in the literature (DataCite International Data Citation Metadata Working Group, 2015; Kratz & Strasser, 2014), the relationship between research datasets and other types of research products is critical to a robust identification system of scientific software in scholarly works. However, just like descriptive metadata elements, the reuse relationship of LAMMPS is hardly included in the current citation practice, because there is no position of this information in the structured citation practice, no matter what type of resource that is cited.
Table 4. Model information identified in the sample
AIREBO is an extended version of the original REBO potential developed by Brenner (1990). It was proposed by Stuart, Tutein & Harrison (2000) to study hydrocarbon system, which is unable to be simulated by the original model. Embedded atom method (EAM) potential was first developed by Daw & Baskes (1983, 1984) to study metallic systems including the parameters of fractures, surfaces, impurities, and alloy additions.
Moreover, it is surprising that a very large number of papers belong to the unspecified reuse category, and that a large number of these papers have a very short description of reusing LAMMPS. These findings have also prompted us to question whether or not some papers just failed to mention if they were using a modified version of the software, if not something else. However, this irregularity can only be examined in our future studies, or if we were able to interview authors of all the papers examines, but this is impossible give the time span.
The existence of these models in the papers suggests that LAMMPS have been used for various purposes in scientific studies. And their importance can be validated by the fact that there are more than 1,000 papers on Google Scholar with AIREBO along with LAMMPS as the searching terms. The information about these models adopted may help researchers and other stakeholders better understand how the scientific software as research data help to achieve the goals of each study project.
Based on our findings and Ince, Hatton & GrahamCumming's comments (2012) that natural language descriptions are insufficient and unintentionally misleading to describe the software citation, it is our suggestion to start exploring the possibilities to use a structured and comprehensive way to include metadata elements and relationship around scientific software in the citation. A large segment of important information is absent from the current citation practice, and it is a barrier to the realization of reproducible studies.
CONCLUSION Paradise Lost: What is Missing from the Current Software Citation Practice?
The results presented above show that how LAMMPS is cited in the research papers cannot help people learn and keep track of much meaningful information about how it is reused in the studies. This general conclusion is true for both descriptive metadata elements and reuse relationships based on the sample of papers citing LAMMPS.
Limitations
As an exploratory study, our work reveals a few limitations that should be noted. First of all, by choosing the 400 most cited papers out of more than 10,000 papers that cited Plimpton's 1995 paper, our sample could be potentially less representative than desired. Second, LAMMPS might be a special case even among the scientific software used in the field of engineering, because of its extensibility and popularity. Last, our reuse type scheme represents an "etic" or "outsider" perspective, which needs to be further validated by evidences collected from researchers but especially software programmers. To address some of these limitations, a materials scientist/engineer, who is part of our larger metadata team, validate the relationships, but it’s possible that a larger sample would reveal different result. Despite these limitations, we believe our research provides important baseline data for further studies, including methods on how to study this topic.
Descriptive Metadata Elements
In terms of descriptive metadata elements, the conclusion is that there is no position for these elements in the status quo. According to the official instruction of citing LAMMPS, Plimpton's 1995 paper and the link to the software’s website should be cited as the substitutes of LAMMPS. However, the only descriptive metadata element about the software itself in these two recommended text strings is the link to the website, which can be interpreted as a type of identifier based on DataCite's specification, version 3.1. (DataCite International Data Citation Metadata Working Group, 2015) None of the other required, recommended or optional metadata elements in DataCite schema can be found in the recommended citation practice of LAMMPS. However, beyond the structural citation of LAMMPS, there are a few metadata elements describing the reuse of LAMMPS that can be identified in the natural language descriptions in the papers, most notably, version, parallel codes, and model of calculation. However, it is also shown
Future Directions
This paper presents the methods and results of our exploratory study on how the information about the reuse of 6
LAMMPS is presented in research papers. Despite of the limitation noted above, we think that the methods and the initial outcomes provide a baseline for future studies. We plan continue our study with a larger sample to compare different software in similar and different fields. We are in communication with relevant working groups in the Research Data Alliance (RDA), where we may draw upon expertise, and conduct research that has value to the goals of the larger RDA effort. Finally, we are also considering conduct qualitative studies to get more detailed “insider” perspectives of this concept from the scientists and software programmers. In conclusion, we seek to continue to the larger dialog of research data reuse, and the inclusion of software, and software citations practices, as an important part of this area of study.
Cao, A., & Ma, E. (2008). Sample shape and temperature strongly influence the yield strength of metallic nanopillars. Acta Materialia, 56(17), 4816–4828. Cao, A., & Wei, Y. (2006). Atomistic simulations of the mechanical behavior of fivefold twinned nanowires. Physical Review B, 74(21), 214108. Cao, A., & Wei, Y. (2007). Atomistic simulations of crack nucleation and intergranular fracture in bulk nanocrystalline nickel. Physical Review B, 76(2), 024113. Cao, A., Wei, Y., & Ma, E. (2008). Grain boundary effects on plastic deformation and fracture mechanisms in Cu nanowires: Molecular dynamics simulations. Physical Review B, 77(19), 195429. Carlson, S., & Anderson, B. (2007). What Are Data? The Many Kinds of Data and Their Implications for Data ReUse. Journal of Computer-Mediated Communication, 12(2), 635–651.
ACKNOWLEDGMENTS
The authors would like to thank Dr. Garritt Tucker from College of Engineering at Drexel University and Dr. Xuemei Gong at Quixey for their important contributions to this project. We would also like to acknowledge support of the NSF/Center for Visualization and Decisions Informatics (CVD) OCI#1160958.
Citing LAMMPS in your papers. (n.d.). Retrieved March 17, 2016, from http://lammps.sandia.gov/cite.html Crouch, S., Hong, N. C., Hettrick, S., Jackson, M., Pawlik, A., Sufi, S., … Parsons, M. (2013). The Software Sustainability Institute: Changing Research Software Attitudes and Practices. Computing in Science & Engineering, 15(6), 74–80. http://doi.org/10.1109/MCSE.2013.133
REFERENCES
Agrawal, R., Peng, B., & Espinosa, H. D. (2009). Experimental-computational investigation of ZnO nanowires strength and fracture. Nano Letters, 9(12), 4177–4183.
Curty, R. (2015). Beyond “Data Thrifting”: An Investigation of Factors Influencing Research Data Reuse In the Social Sciences. Dissertations - ALL. Retrieved from http://surface.syr.edu/etd/266
Ahalt, S., Couch, A., Ibanez, L., & Ray Idaszak, R. (2015). NSF Workshop on Supporting Scientific Discovery through Norms and Practices for Software and Data Citation and Attribution.
DataCite International Data Citation Metadata Working Group, & others. (2015). DataCite metadata schema for the publication and citation of research data version 3.1. Retrieved from https://schema.datacite.org/meta/kernel3/doc/DataCite-MetadataKernel_v3.1.pdf
Anderson, J. A., Lorenz, C. D., & Travesset, A. (2008). General purpose molecular dynamics simulations fully implemented on graphics processing units. Journal of Computational Physics, 227(10), 5342–5359. Ball, A. (2009). Scientific Data Application Profile Scoping Study Report. Retrieved from
Daub, C. D., Wang, J., Kudesia, S., Bratko, D., & Luzar, A. (2010). The influence of molecular-scale roughness on the surface spreading of an aqueous nanodrop. Faraday Discussions, 146, 67–77.
http://www.ukoln.ac.uk/projects/sdapss/papers/ball2009sda -v11.pdf
Daw, M. S., & Baskes, M. I. (1983). Semiempirical, Quantum Mechanical Calculation of Hydrogen Embrittlement in Metals. Physical Review Letters, 50(17), 1285–1288. http://doi.org/10.1103/PhysRevLett.50.1285
Barrat, J.-L., & Bocquet, L. ’ric. (1999). Influence of wetting properties on hydrodynamic boundary conditions at a fluid/solid interface. Faraday Discussions, 112, 119– 128. Borgman, C. L. (2012). The conundrum of sharing research data. Journal of the American Society for Information Science and Technology, 63(6), 1059–1078.
Daw, M. S., & Baskes, M. I. (1984). Embedded-atom method: Derivation and application to impurities, surfaces, and other defects in metals. Physical Review B, 29(12), 6443–6453. http://doi.org/10.1103/PhysRevB.29.6443
Brenner, D. W. (1990). Empirical potential for hydrocarbons for use in simulating the chemical vapor deposition of diamond films. Physical Review B, 42(15), 9458–9471. http://doi.org/10.1103/PhysRevB.42.9458
Diamantopoulos, N., Sgouropoulou, C., Kastrantas, K., & Manouselis, N. (2011). Developing a metadata application profile for sharing agricultural scientific and scholarly research resources. In Research Conference on
Cao, A. J., & Wei, Y. G. (2007). Molecular dynamics simulation of plastic deformation of nanotwinned copper. Journal of Applied Physics, 102(8), 083511. 7
Metadata and Semantic Research (pp. 453–466). Springer. Retrieved from http://link.springer.com/chapter/10.1007/978-3-64224731-6_45
Ji, C., & Park, H. S. (2006). Geometric effects on the inelastic deformation of metal nanowires. Applied Physics Letters, 89(18), 181916. Joppa, L. N., McInerny, G., Harper, R., Salido, L., Takeda, K., O’Hara, K., … others. (2013). Troubling trends in scientific software use. Science, 340(6134), 814–815.
Faniel, I., Kansa, E., Whitcher Kansa, S., Barrera-Gomez, J., & Yakel, E. (2013). The challenges of digging data: a study of context in archaeological data reuse. In Proceedings of the 13th ACM/IEEE-CS joint conference on Digital libraries (pp. 295–304). ACM. Retrieved from http://dl.acm.org/citation.cfm?id=2467712
Kalé, L., Skeel, R., Bhandarkar, M., Brunner, R., Gursoy, A., Krawetz, N., … Schulten, K. (1999). NAMD2: greater scalability for parallel molecular dynamics. Journal of Computational Physics, 151(1), 283–312.
Faniel, I. M., & Jacobsen, T. E. (2010). Reusing Scientific Data: How Earthquake Engineering Researchers Assess the Reusability of Colleagues’ Data. Computer Supported Cooperative Work (CSCW), 19(3-4), 355–375. http://doi.org/10.1007/s10606-010-9117-8
Karayiannis, N. C., & Mavrantzas, V. G. (2005). Hierarchical modeling of the dynamics of polymers with a nonlinear molecular architecture: Calculation of branch point friction and chain reptation time of H-shaped polyethylene melts from long molecular dynamics simulations. Macromolecules, 38(20), 8583–8596.
Faniel, I. M., Kriesberg, A., & Yakel, E. (2015). Social scientists’ satisfaction with data reuse. Journal of the Association for Information Science and Technology. Retrieved from http://onlinelibrary.wiley.com/doi/10.1002/asi.23480/full
Kratz, J., & Strasser, C. (2014). Data publication consensus and controversies. F1000Research. http://doi.org/10.12688/f1000research.3979.3 Lagoze, C. (2000). Accommodating Simplicity and Complexity in Metadata: Lessons from theDublin Core Experience. Cornell University. Retrieved from https://ecommons.cornell.edu/handle/1813/5792
García, F., Bertoa, M. F., Calero, C., Vallecillo, A., Ruíz, F., Piattini, M., & Genero, M. (2006). Towards a consistent terminology for software measurement. Information and Software Technology, 48(8), 631–644. http://doi.org/10.1016/j.infsof.2005.07.001
LAMMPS History. (n.d.). Retrieved March 17, 2016, from http://lammps.sandia.gov/history.html
Goble, C. (2014). Better software, better research. Internet Computing, IEEE, 18(5), 4–8.
Liang, W., & Zhou, M. (2006). Atomistic simulations reveal shape memory of fcc metal nanowires. Physical Review B, 73(11), 115409.
Herterich, P., & Dallmeier-Tiessen, S. (2016). Data Citation Services in the High-Energy Physics Community. D-Lib Magazine, 22(1/2). http://doi.org/10.1045/january2016herterich
Liu, W., Schmidt, B., Voss, G., & Müller-Wittig, W. (2008). Accelerating molecular dynamics simulations using Graphics Processing Units with CUDA. Computer Physics Communications, 179(9), 634–641.
Hey, A. J., Tansley, S., & Tolle, K. M. (2009). The fourth paradigm: data-intensive scientific discovery (Vol. 1). Microsoft Research Redmond, WA.
Martone, M. (2014). Data citation synthesis group: Joint declaration of data citation principles. FORCE11. Retrieved from https://www.force11.org/group/jointdeclaration-data-citation-principles-final
Hong, N. C. (2014). Minimal information for reusable scientific software. In Proceedings of the 2nd Workshop on Working towards Sustainable Scientific Software: Practice and Experience. Retrieved from http://www.research.ed.ac.uk/portal/files/16773670/Mini malInfoScientificSoftware.pdf
McMahon, D. P., Cheung, D. L., & Troisi, A. (2011). Why holes and electrons separate so well in polymer/fullerene photovoltaic cells. The Journal of Physical Chemistry Letters, 2(21), 2737–2741.
Huang, X., Ding, X., Lee, C. P., Lu, T., & Gu, N. (2013). Meanings and boundaries of scientific software sharing. In Proceedings of the 2013 conference on Computer supported cooperative work (pp. 423–434). ACM. Retrieved from http://dl.acm.org/citation.cfm?id=2441825
Mishin, Y., Asta, M., & Li, J. (2010). Atomistic modeling of interfaces and their impact on microstructure and properties. Acta Materialia, 58(4), 1117–1151. Mooney, H. (2011). Citing data sources in the social sciences: do authors do it? Learned Publishing, 24(2), 99–108. http://doi.org/10.1087/20110204
Ince, D. C., Hatton, L., & Graham-Cumming, J. (2012). The case for open computer programs. Nature, 482(7386), 485–488. http://doi.org/10.1038/nature10836
Mooney, H., Newton, M. P., & others. (2012). The anatomy of a data citation: Discovery, reuse, and credit. Journal of Librarianship and Scholarly Communication, 1(1), 1.
Jang, S. S., Molinero, V., Cagin, T., & Goddard, W. A. (2004). Nanophase-segregation and transport in Nafion 117 from molecular dynamics simulations: effect of monomeric sequence. The Journal of Physical Chemistry B, 108(10), 3149–3157.
Morin, A., Urban, J., Adams, P. D., Foster, I., Sali, A., Baker, D., & Sliz, P. (2012). Shining Light into Black Boxes. Science, 336(6078), 159–160. http://doi.org/10.1126/science.1218263 8
Park, H. S. (2006). Stress-induced martensitic phase transformation in intermetallic nickel aluminum nanowires. Nano Letters, 6(5), 958–962.
Stuart, S. J., Tutein, A. B., & Harrison, J. A. (2000). A reactive potential for hydrocarbons with intermolecular interactions. The Journal of Chemical Physics, 112(14), 6472–6486.
Park, H. S., Gall, K., & Zimmerman, J. A. (2005). Shape memory and pseudoelasticity in metal nanowires. Physical Review Letters, 95(25), 255504.
Thaney, K. (2013). Code as a research object: a new project. Retrieved April 17, 2016, from https://mozillascience.org/code-as-a-research-object-anew-project
Park, H. S., Gall, K., & Zimmerman, J. A. (2006). Deformation of FCC nanowires by twinning and slip. Journal of the Mechanics and Physics of Solids, 54(9), 1862–1881.
Tschopp, M. A., & McDowell, D. L. (2008a). Dislocation nucleation in Σ3 asymmetric tilt grain boundaries. International Journal of Plasticity, 24(2), 191–217.
Park, H. S., & Zimmerman, J. A. (2005). Modeling inelasticity and failure in gold nanowires. Physical Review B, 72(5), 054106.
Tschopp, M. A., & McDowell, D. L. (2008b). Influence of single crystal orientation on homogeneous dislocation nucleation under uniaxial loading. Journal of the Mechanics and Physics of Solids, 56(5), 1806–1830.
Park, H. S., & Zimmerman, J. A. (2006). Stable nanobridge formation in< 110> gold nanowires under tensile deformation. Scripta Materialia, 54(6), 1127–1132.
Tschopp, M. A., Spearot, D. E., & McDowell, D. L. (2007). Atomistic simulations of homogeneous dislocation nucleation in single crystal copper. Modelling and Simulation in Materials Science and Engineering, 15(7), 693.
Parsons, M. A., & Fox, P. A. (2013). Is data publication the right metaphor? Data Science Journal, 12(0), WDS32– WDS46. Paskin, N. (2005). Digital Object Identifiers for scientific data. Data Science Journal, 4, 12–20. http://doi.org/10.2481/dsj.4.12
Tschopp, M. A., Tucker, G. J., & McDowell, D. L. (2007). Structure and free volume of< 110> symmetric tilt grain boundaries with the E structural unit. Acta Materialia, 55(11), 3959–3969.
Piwowar, H. A., & Chapman, W. W. (2010). Public sharing of research datasets: a pilot study of associations. Journal of Informetrics, 4(2), 148–156. Piwowar, H. A., Day, R. S., & Fridsma, D. B. (2007). Sharing detailed research data is associated with increased citation rate. PloS One, 2(3), e308.
Tschopp, M. A., Tucker, G. J., & McDowell, D. L. (2008). Atomistic simulations of tension–compression asymmetry in dislocation nucleation for copper grain boundaries. Computational Materials Science, 44(2), 351–362.
Plimpton, S. (1995). Fast Parallel Algorithms for ShortRange Molecular Dynamics. Journal of Computational Physics, 117(1), 1–19. http://doi.org/10.1006/jcph.1995.1039
Voronov, R. S., Papavassiliou, D. V., & Lee, L. L. (2006). Boundary slip and wetting properties of interfaces: correlation of the contact angle with the slip length. The Journal of Chemical Physics, 124(20), 204701.
Rung, J., & Brazma, A. (2013). Reuse of public genomewide gene expression data. Nature Reviews Genetics, 14(2), 89–99. http://doi.org/10.1038/nrg3394
White, H. C., Carrier, S., Thompson, A., Greenberg, J., & Scherle, R. (2008). The Dryad data repository: A Singapore framework metadata architecture in a DSpace environment. Universitätsverlag Göttingen, 157.
Sachs, J. N., Crozier, P. S., & Woolf, T. B. (2004). Atomistic simulations of biologically realistic transmembrane potential gradients. The Journal of Chemical Physics, 121(22), 10847–10851.
Willis, C., Greenberg, J., & White, H. (2012). Analysis and synthesis of metadata goals for scientific data. Journal of the American Society for Information Science and Technology, 63(8), 1505–1520.
Starr, J., Castro, E., Crosas, M., Dumontier, M., Downs, R. R., Duerr, R., … others. (2015). Achieving human and machine accessibility of cited data in scholarly publications. PeerJ Computer Science, 1, e1.
Wilson, G., Aruliah, D. A., Brown, C. T., Chue Hong, N. P., Davis, M., Guy, R. T., … others. (2012). Best practices for scientific computing. arXiv Preprint arXiv:1210.0530. Retrieved from http://journals.plos.org/plosbiology/article?id=10.1371/jo urnal.pbio.1001745
Starr, J., & Gastl, A. (2011). isCitedBy: A metadata scheme for DataCite. D-Lib Magazine, 17(1), 9. Stewart, C. A., Almes, G. T., & Wheeler, B. C. (2010). Cyberinfrastructure Software Sustainability and Reusability: Report from an NSF-funded workshop. Indiana University, Bloomington, IN, 5. Retrieved from http://scholar.google.com/scholar?cluster=120088962489 42673195&hl=en&oi=scholarr
Wynholds, L. A., Wallis, J. C., Borgman, C. L., Sands, A., & Traweek, S. (2012). Data, data use, and scientific inquiry: Two case studies of data practices. In Proceedings of the 12th ACM/IEEE-CS joint conference on Digital Libraries (pp. 19–22). ACM. Retrieved from http://dl.acm.org/citation.cfm?id=2232822
9
Zhang, L., & Greenfield, M. L. (2007). Molecular orientation in model asphalts using molecular simulation. Energy & Fuels, 21(2), 1102–1111.
International Journal on Digital Libraries, 7(1-2), 5–16. http://doi.org/10.1007/s00799-007-0015-8 Zimmerman, A. S. (2008). New knowledge from old data the role of standards in the sharing and reuse of ecological data. Science, Technology & Human Values, 33(5), 631–652.
Zimmerman, A. (2007). Not by metadata alone: the use of diverse forms of knowledge to locate data for reuse.
10