Stabilisation operationalised: Using time series analysis to understand the dynamics of research collaboration Eleftheria Vasileiadou Institute for Environmental Studies (IVM) Vrije Universiteit Amsterdam De Boelelaan 1085 1081 HV Amsterdam, the Netherlands email:
[email protected] tel: 0031627255429 fax: 0031205989553
Abstract The aim of the paper is to investigate the use of online data and time series analysis, in order to study the dynamics of new types of research collaboration in a systematic way. Two international research teams were studied for more than three years, and quantitative data about their internet use together with observation of their collaboration patterns were gathered. Time series analysis (ARIMA modelling) was performed on their use of internet, and specific types of models related to specific ways of conducting research at a distance. The paper proposes the use of online data and ARIMA models to identify the stabilisation of a complex system, such as a research team, and investigate everyday research practices. Keywords: scientific collaboration, time series analysis, ARIMA models, e-science, online data
1. Introduction There are two mutually reinforcing dynamics in the scientific landscape today on which the present paper is based upon: on one hand, the increase of scientific collaboration (Beaver, 2001; Sonnenwald, 2007); and, on the other hand, the introduction and increasing use of information and communication technologies (ICTs) in sciences 1
(Heimeriks & Vasileiadou, 2008). However, it is not traditional scientific collaboration only that has increased in importance. Hackett (2005) notes how the social organization of scientific collaboration has changed, with traditional research groups being complemented by episodic working groups,
contractual agreements between
organizations, international collaborations, and interactions between scientists and nonscientists (engineers, companies etc). How can we study the dynamics of these new types of collaboration? The current paper focuses on such types of collaboration and aims at understanding their dynamics through their informal communication patterns. In scientometrics, scientific collaboration has been studied mainly with the use of coauthorship patterns. These data, however, cannot adequately address research questions at the micro-level of everyday research practices, especially with respect to new types of distributed scientific collaboration. The micro-level of everyday research practices has been generally studied with the use of qualitative methods of observing research practices in a lab, or interviewing researchers, which, even though rich in detail, cannot provide measures systematic enough to be quantified and thus used in comparative studies. The current paper aims to fill in this methodological gap by suggesting a step down for scientometrics: just as the dynamics of science can be understood with the study of formal communications of scientists, the everyday dynamics of research can be understood with the study of (informal) communication patterns of researchers in a given context. The aim of this paper is to study the informal communications of research teams, through the use of ICTs, in order to derive quantitative indicators that can help us understand the dynamics of research practices over time. The current paper examines two international research teams’ use of ICTs for a period of more than three years. Real-time online data were gathered and analyzed with time series analysis in order to probe into the collaborative practices of the two teams. This analysis was supplemented with a description of the collaboration dynamics of the two distributed teams.
2
In what follows, I first review the methodological choices and challenges of studying scientific collaboration. Second, I describe the data collection process and the methodology used. Third, I describe the collaboration dynamics of the two teams, which will assist in understanding the time series analysis. Fourth, I present the results of ARIMA modelling of ICT use for the two teams. In the last section, I discuss both the methodological and theoretical implications of the results, and I draw conclusions.
2. Review There are three main methods that have been used for the study of scientific collaboration and its dynamics: co-authorships, qualitative observation/interview studies and surveys. First, and predominant in scientometrics, has been the employment of co-authorship as an indicator of collaboration. This is based on two assumptions: first, that the authors of an article or book have all contributed to some extent to the writing of the article or chapter. This, however, need not be the case in fields such as the biomedical fields, where hyperauthorship (Cronin, 2001) is becoming the norm, and where the authors’ list can also include honorary co-authors; for instance, the director of the lab, or the scientist who wrote the funding proposal (Katz & Martin, 1997). The second assumption of co-authorship is that all collaborators are captured by coauthorship patterns. This has been contested (Katz & Martin, 1997), as studies combining different methods suggest that co-authorships measure only particular types of collaboration. For instance, Laudel (2002) used interviews with natural scientists in a research centre to identify six types of collaborations and showed that only certain types of collaborations are awarded with co-authorship, others only by a mention in the acknowledgements (around one third), while other collaboration activities are not rewarded at all (about half of them). In conclusion, there is empirical evidence that suggests co-authorship may not be the best indicator for capturing the dynamics of research and scientific collaboration. Even though the use of co-authorship as an indicator of collaboration is not perfect, it nevertheless has the advantage of being verifiable and systematic, non-intrusive and non-
3
reactive, since the process of measuring doesn’t affect the collaboration process (Katz & Martin, 1997). These are precisely the limitations of the second method used to investigate scientific and research collaboration: qualitative methods of observation of scientists’ activities and interviews (e.g. Atkinson et al., 1998; Hara et al., 2003; Jeffrey, 2003). What they all have in common is an understanding of the practice of collaboration as an inherently more “messy” process, with the risks, tensions and local contingencies it entails. With some exceptions, however, (e.g. Shrum et al., 2007), they all lack a systematic approach which could help compare those results in different settings. Third, surveys have also been used to map the dynamics of scientific collaboration (e.g. Lee & Bozemann, 2005; Birnholtz, 2005; Walsh & Maloney, 2002). The advantage of this method, apart from systematic quantifiable results, which can be replicated in different contexts, is that they do not take the concept of collaboration for granted. Instead, some of them rely on the respondents’ understanding of who their relevant collaborators are, and what constitutes collaboration. However, self-reported methods of data collection are prone to inaccuracies because of bias responses, poor memory etc (Yin, 2003). Especially in studies on ICT use in science, self-reported methods may be inadequate (Matzat, 2004; Vasileiadou & Van den Besselaar, 2004). Finally, acknowledgements have been used as data to elicit insights about scientific collaboration networks (Cronin, 1995). As Giles and Councill (2004) note, acknowledgements provide information on networks of technical support, moral support, presentational support, financial support, editorial support and, most importantly, conceptual support, or what they call “peer interactive communication”. These categories reflect specific aspects of scientific collaboration and come closer to, but do not fully cover, informal communication activities. The advantage of acknowledgements is their increasing standardization, which has also enabled automatic harvesting and analysis. On the other hand, a substantial amount of informal scientific communication never reaches the state of an acknowledgment, because it does not result in a formal publication (Garvey, 1979).
4
One way of understanding the differences between these methods is to view the sciences as operating at three interrelated but analytically distinct levels (Rip, 1990). First, the researching level consists of the everyday activities of scientists and researchers in their local context of work: gathering data, fiddling with data analysis and writing up results. At this stage, we can also imagine activities, such as identifying possible collaborators, writing grant proposals etc. Second is the scientizing level, which refers to the formal communication activities of scientists (journal articles, books etc), and allows the coordination of scientists in scientific fields (Gläser, 2003). Collaboration at this level reflects interactions through journals and conferences as co-authorship of publications, or use of knowledge claims of other scientists in the field, through citations. Third, politicking refers to activities of general mobilization of scientists and scientific communities in the wider world as social actors. So, at the politicking level, collaboration may entail the teaming up of scientists with policy-makers in a technology assessment programme, or collaboration between universities and NGOs. The use of co-authorship patterns for the study of collaboration occurs usually at the scientizing level, whereas qualitative studies investigate collaboration at the researching level. Surveys are also used to investigate the researching level (since the individual researcher is the level of analysis), but often the results of their analysis is contrasted with results obtained from the scientizing level (e.g. Bozeman & Corley, 2004). This paper investigates a method for studying the dynamics of research collaboration, with the use of communication based on ICTs. The collaboration teams under study can be conceptualised as emerging systems on the basis of their (informal) communication patterns. As collaboration takes place increasingly between geographically distributed teams, ICTs, and especially email, are becoming more and more prevalent means of communication among researchers (Heimeriks & Vasileiadou, 2008; Nentwich, 2003; Schunn et al., 2002). Therefore, the increasing use of ICTs in research collaboration provides us with a wealth of data, which can help us understand practices of research
5
collaboration. Moreover, the use of online data has the advantage of a systematic database, which is unobtrusive and quantifiable. However, what do these data reflect? Communication patterns should not be confused with collaboration patterns as is sometimes done (e.g. Caldas, 2003). In this paper, I investigate the hypothesis that ICT communication patterns can provide us with insights into the dynamics of collaborations, and try to identify what these insights are.
3. Methodology Two case studies were selected, which typified the new types of collaboration: episodic, international, interdisciplinary research teams with contractual agreements between the members. The two research teams were funded by the European Commission under the Fifth Framework Programme: DELTA1, investigating the use of email in organizations and consisting of eight local research groups from different countries; and ERICOM, investigating web and non-web indicators for the science-technology-economy system and consisting also of eight local groups in different countries. The complete archives of the two team-wide emailing lists (general and managerial list) for each team were collected: from their initiation until one month after the formal end of the projects. This included for DELTA: 2726 emails to the general list, 415 messages to the managerial list; for ERICOM: 697 emails to the general list, 379 emails to the managerial list. Moreover, the attachments sent to the lists were coded as a separate communication activity (if a word document was sent in an email in two different formats, e.g. in .doc and in .rtf, I coded this as one attachment). Data on the use of the team-wide online repository space were gathered in a different way for each of the two teams. In ERICOM, the logfiles of the online area were studied, and all the downloading activity of files was analysed2. In DELTA, the activity on the team’s website was studied with the use of the emailing list, since every time a member uploaded a document, an email was sent to the list notifying the members. Therefore, in DELTA, all uploading 1
Pseudonyms. The activity in the ERICOM website was stored as logfiles by the website server using Microsoft Internet Services. All the activity in the internal forum of the website was isolated, and the logfile entries with status code 200 were selected, with method “get” and a file extension (.***). In this way only the “successful downloading activity” of the members was coded as blackboard use.
2
6
activity in the online space was analysed. Finally, the team-wide meetings were coded as “the number of people participating in the meeting”. The analysis of the communication activities was performed against time: weeks were selected as a natural time distinction from the observation of the communication patterns of the teams, since few emails were sent over the weekend. For the DELTA case, the percentage of emails sent during weekends overall was: 5,8% of the emails to the general list and 6,3% of the emails to the managerial list. For the ERICOM case, these were: 4,9% of the emails to the general list and 2,2% of the emails to the managerial list. Moreover, weeks summarized the behavior of the communication variables without losing any information. The use of communication media for each week was inserted in an SPSS database, which contained the following variables for each week: number of emails sent to the general emailing list, the number of emails sent to the management list, the number of unique attachments sent to each of these lists, the number of participants in meetings, the number of unique documents uploaded on the blackboard (DELTA) and the number of documents downloaded from the internal forum (ERICOM). The number of observations (weeks) was 171 for DELTA and 172 for ERICOM. ARIMA modelling was selected as a method of analysis because of its suitability for dealing with issues of non-stationarity and autocorrelation in a systematic and appropriate way (Hollanders and Vliegenthart, 2008). ARIMA modelling is a method of time series analysis coming from econometrics for the study of time series data, that is, data that comprise of a set of observations measured at regular intervals of time (Romer, 2006). In communication science, it has also been used, to a limited extent, to model time series of media (e.g. Boomgarden and Vliegenthart, 2007), but its potential has not yet been fully exploited (Hollanders and Vliegenthart, 2008). First, in order to assess the distribution shape or to identify possible outliers, each variable was plotted (Warner, 1998). In cases where a trend could be detected, an OLS
7
regression was performed to investigate the direction of that trend. In order to investigate autocorrelation (whether the variable has memory, or the extent to which the present values of the variable depend on the past values Gottman, 1981: 33), the next step was the autocorrelation function (ACF and PACF). ACF gives the correlation of the variable between two lags, and PACF gives the correlation of the variable between two lags, controlling for the effect of earlier lags. In the variables that exhibited autocorrelation, a suitable ARIMA model was fitted. The visual inspection of the ACF and PACF graphs leads to a tentative identification of a suitable ARIMA model (McCleary et al., 1980; SPSS Inc., 1987: E9-E11; Hollanders and Vliegenthart, 2008). Of importance is how many spikes the two graphs have (which shows whether it is a first or second- order model), and how smoothly they decline. If the ACF graph declines smoothly and the PACF abruptly, the variable can be modelled with Autoregression. If the PACF graph declines smoothly and the ACF abruptly, the variable can be modelled with Moving Average. In practice, it is often not easy to detect these differences, so there are a number of additional criteria for the best-fit model. The first, most important, criterion is that all parameters of the model are statistically significant. Second, the indicators AIC (Akaike Information Criterion) and SBC (Schwarz Bayesian Criterion) are used to identify which model fit the data best. The lowest AIC and SBC indicate the most parsimonious model. Finally, each model needs to have residuals (error) normally distributed, that is, without any autocorrelation. For this reason, for each model, the residuals were saved and an ACF/ PACF function was run on those residuals, to confirm that they had no memory. The ARIMA analysis is based on the concept of random disturbances or shocks in the process of each variable. That means that between two observations, a disturbance occurs that affects the level of the series. This disturbance, or random shock, may be any factor which varies across time and interacts with the variable under study in complex ways (McCleary et al., 1980: 40). None of these factors alone could explain the behaviour of the variable under study. For instance, a random disturbance of the frequency of
8
communication may be the vacation of a member in one week, a sickness of a member another week, a technical problem in the emailing list on another week. There are three types of processes that can be modelled with an ARIMA model: Autoregression (AR), Moving averages (MA) and Integrated models (I). The AR models, e.g. ARIMA (1,0,0), show that the value of the variable is a function of the preceding 1 or 2 (in models 2,0,0) values. Conceptually the model is one with “memory” of its value in the sense that each value is correlated with all preceding values. Thus, each shock or disturbance to the system has a diminishing effect on all subsequent time periods. The formula of the model is: Value(t) = disturbance(t) + φ*Value (t-1) The MA models, e.g. ARIMA (0,0,1), indicate that each value of the variable is determined by the average of the current disturbance and the previous disturbance. So, in an ARIMA (0,0,1) model a disturbance affects the process for the current week and the week after that, and then it abruptly ceases to affect it. For instance, negative MA values at short lags indicate a process that is influenced by shocks, and then tends back to equilibrium. In this sense, “the AR process is said to have a longer memory than the MA process” (Romer, 2006: 173, italics in the original). The formula of the model is: Value(t) = disturbance (t) + θ* disturbance (t-1) The Integrated models, e.g. ARIMA (0,1,0), reflect the cumulative effect of some process (McCleary et al., 1980). Each value equals the previous value (which is the cumulative sum of changes/ differences in the previous stages) and some random fluctuation (disturbance). Therefore, we can understand that an ARIMA (0,1,0) actually means an AR(1,0,0) with φ=1. This type of process is the sum of all past disturbances, and in this sense, the integrated models are the most sensitive to disturbances, since any shock (event) has a “permanent effect” (Hollanders and Vliegenthart, 2008; 53), and not with a diminishing effect (as in the autoregressive model). In a (0,1,0) model, the variable is differenced. The formula is Value(t) = disturbance (t) + Value (t-1)
9
A process described by an integrated model is not stationary. More specifically, a variable that need differencing is not stationary in its mean. This means that the mean of the variable is not stable in time (it drifts), and depends on the time of observation. Further, a variable may require log transformation, when it is not stationary in its variance: its variance depends on the time of observation. This means that the higher the mean of the variable, the higher its variance. It indicates that the process underlying the series has “naturally defined ‘floors’, which constrain the stochastic behavior of the process” (McCleary et al., 1980). As the variable reaches its ‘floor’, the variance decreases and the behavior of the variable stabilises. This is called autoregressive conditional
heteroscedasticity
(ARCH).
For
the
variables
that
required
log
transformation, the value 0 in the variable was recoded as 0,001, to avoid missing values in the residuals of the ARIMA. In a stationary series, the variance and mean does not depend on the time of measurement. Processes described by MA or AR models only, e.g. (0,0,1) or (1,0,0) models, are stationary. This means that their underlying process is stable in historical time. This brings in the distinction between the observed time series (called a realisation) and the process underlying this observation. Therefore, the variable may fluctuate in time, but the process underlying this may be unchanged and stable in historical time throughout the period of observation (stationary). Finally, I studied the actual content of the emails from the two lists, internal documents and reports of the two projects, as well as direct observations of their activities in order to understand their collaboration dynamics. I read the content of all emails and attachments, in order to understand how each medium was used in the team, and how the collaboration patterns evolved. Further, I was an observer of the collaboration process and their communication patterns (24 months of the 30 working months for DELTA and 12 months of the 40 for ERICOM). Attention was paid to how communication media were used, how the team coordinated its tasks, how decisions were made in each team, and how it was assessed by the project officer. The decisions, and task allocation, was studied
10
at the team-wide level. Further, as the time dimension was relevant in this study, the description of the cases relates to how collaboration developed in time. In the next section, I first provide a brief description of the collaboration dynamics of the two teams and how they evolved over time, to be used as a backcloth against which the results of the ARIMA analysis can be understood. Second, I present the results of the time series analysis.
4. Results
4.1 Description of cases The DELTA project aimed at studying the influence of the use of email on organizational processes and to report its results to the European Commission, thus having as intended output academic reports. The team consisted of researchers with more or less similar expertise within the broad social sciences. The team used heavily ICTs, with the general emailing list playing the most important role as the medium for everyday communication and coordination, for decisions and for task allocation. In addition, a managerial list and an online blackboard tool were also used for team-wide communication. Meetings were not frequent (usually twice a year) and had a formal character; they were used mainly to coordinate and allocate tasks between the team members and to make decisions about the work. The use of the blackboard also had a rather formal character, with written decisions of the team of who was responsible for uploading which document, in which specific folder, by which date. The work was organised and coordinated at a team-wide level: tasks were broken down in subtasks, and they were assigned to formalised subgroups. This type of coordination was stable in time and did not change. The team-wide coordination proved quite successful, as reports were generally delivered on time to the project officers, and the project received positive evaluation and overall assessment. The dynamics of
11
collaboration of the DELTA team (the decision-making processes, the way work was coordinated and task were allocated) remained stable throughout time, and in this sense, we can say that the team managed to create and reproduce a rather stable collaboration pattern. The ERICOM team, on the other hand, aimed at developing and testing new science indicators using web- and non-web data; the team had thus a risk factor imprinted in the research design (to develop something new). The team consisted of experts from different fields, with more diverse disciplinary backgrounds than DELTA. At the same time, all research stages (data gathering, analysis etc) ran in parallel. All these factors contributed to more complexity in ERICOM than in DELTA. The communication of the team was very much based on frequent face-to-face interactions, and the two emailing lists (the general and the managerial) were used less than in DELTA. An important characteristic of media use in ERICOM was the change in character with time. The general list was used limited in the first year of the project and only for exchange of information; then, in the second year, there was a period that the list was used for coordination activities, brainstorming and decisions; in the last half of the project, it became again a tool only for the exchange of information and work. Meetings also changed in character: in the first year, they were mainly used for brainstorming and the discussion of work; the second year they focused on managerial and coordination issues and the final year they attained a formal character focusing on administrative procedures and responsibilities of the team towards the project officer. The coordination of work in ERICOM also changed in time. Team-wide coordination and task allocation for the team-wide tasks started only after the first year of work and proved problematic, with delays in managerial and substantive tasks. During the last year of the project, each local team was more or less involved with its own tasks, and there was hardly any team-wide coordination. Decision-making processes also changed: in the first year of the project, team members participated in managerial or content-related decisions. That, however, proved problematic, and so, in the middle of the project, a professional
12
project manager was hired. After this, most decisions on team-wide issues became his responsibility. It is also important to note here that after the professional manager was hired, there was a shift in the project towards management and administration. For instance, during meetings, there was less focus on the substantive results than in the first year of the project, and more on the managerial and contractual responsibilities of each partner. In short, the collaboration dynamics in ERICOM changed substantially in time and, in comparison to DELTA, the team did not manage to create a pattern of collaboration that was stable in time. Finally, as a result of managerial and substantive problems, the team received negative evaluation from the project officers more than once, and there was always a danger of the project being stopped.
4.2 Time series analysis First, graphs of the distribution of each variable over time were produced. This helped identify possible trends or outliers in the data (Appendix, figures 1-10). In the uploading function of the blackboard use and the general list attachments of DELTA, an outlier was identified and substituted with the next biggest value. The substitution of outliers in time series analysis is recommended by Warner (1998), as the “outliers may have a distorting effect on the identification of ARIMA models” (McCleary et al. 1980: 128). Then, Ordinary Least Squares (OLS) Regression was performed with the use of the variable ‘week’ (which measured time in weeks), in order to identify possible linear trends. The OLS regression showed a statistically significant linear trend for the use of the management list in the DELTA case (Adj. R Sq. 0.127; Sig. 0.000; Beta -0.356); a statistically significant positive trend of the managerial list attachments in the ERICOM (Beta 0.222; Adj. R. Sq. 0.044; Sig. 0.004); a statistically significant negative trend of the general list in ERICOM (Beta -0.312; Adj. R. Sq. 0.092; Sig. 0.000); and a statistically significant negative trend for the use of internal forum in ERICOM (Beta -0.359; Adj. R. Sq. 0.121; Sig. 0.000).
13
The negative trend in the management list in DELTA indicates a decrease in managerial communication. There are two possible interpretations of this result: either the communication of managerial issues was more relevant in the beginning of the project than in the end, or that the communication of managerial issues was gradually taken over by other media. Since there was no positive trend of the general list (or any other medium), we can assume that this suggests less relevance of managerial communication over time, possibly through gradual learning of management and administration tasks and responsibilities. In ERICOM, the negative trend of the general list and the positive trend in the managerial list attachments reflect the shift in the balance between management and substantive work, identified also by the observation of the team. In the second phase of the project, after the new professional manager took over, the team’s communication revolved around managerial issues and the use of the general list for content discussions and brainstorming declined. Following this, I computed the autocorrelations and partial autocorrelations (ACF/ PACF) of the variables. The analysis revealed similar patterns in the two cases: in DELTA, significant autocorrelations for the general list and its attachments, and the managerial list and its attachments. For the meetings and the blackboard uploads, there is no significant autocorrelation at any lag, which indicates that there is no memory in the variables. In ERICOM, the ACF/PACF revealed memory in all variables except from the meetings. The autocorrelations in the series reflect the memory of these variables, that is, the extent to which their present values depend on their past (either past values or past disturbances). For the use of blackboard, the lack of autocorrelation could be related to its formal function: the description of the cases indicated that the use of the online blackboard in DELTA was regulated by decisions, and functioned as a formal document repository of the team. It is reasonable to suggest that the more formal a medium is and
14
the more prescribed and regulated its use, the less its use is determined by its history or past events in the team. In contrast, the activity in the two lists, whose use was not regulated, revealed memory3. Next, ARIMA models were fit in the variables with memory (Table 1). In the DELTA case study, for the emailing list an ARIMA (0,0,1) was fit with θ = -0.40 and a constant. For the management list an ARIMA (0,0,1) was fit with θ = -0.2 with a constant and a log transformation. For the attachments of the emailing list an ARIMA (0,0,2) was fit with θ1 = -0.3 and θ2 = -0.2 with a constant. For the attachments of the management list an ARIMA (0,0,1) was fit with θ = -0.4 and a constant.
DELTA MA1
General list General list attachments
Logged management list Management attachments
list
Estimates
Std. error
t
Approx Sig
-0.401
0.071
-5.672
0.000
Constant
15.889
1.394
11.395
0.000
MA1
-0.295
0.075
-3.926
0.000
MA2
-0.228
0.075
-3.028
0.003
Constant
2.734
0.376
7.279
0.000
MA1
-0.238
0.088
-2.696
0.008
Constant
-2.575
0.454
-5.673
0.000
MA1
-0.381
0.088
-4.324
0.000
Constant
0.577
0.168
3.442
0.001
Table 1: Univariate ARIMA specifications for DELTA
As noted in the methodology, the MA models indicate that each value of the variable is determined by the current disturbance and the previous disturbance. According to the results obtained here, the use of the two emailing lists and their attachments are affected by disturbances or shocks, with the general list attachments having a longer memory than the other variables. That means that, for instance, a decision or any other event (random shock) would influence the use of the managerial list for two weeks, and then it would stop influencing it. For the case of general list attachments, this influence would last three
3
It could be that the lack of memory on meetings is probably related to the very few number of meetings, but it also needs to be noted that meetings also had a formalized character: the amount of people participating in a meeting each time was decided on the basis of work, and not on the basis of how many people participated in previous meetings.
15
weeks. The ARIMA (0,0,2) model of the attachments indicates that the exchange of work as a process is more sensitive to disturbances and has longer memory. Moreover, in the models for these media the coefficient (θ) is negative, which means that the event or disturbance has a reverse effect on their use the following week. This is similar to the results of Hollanders and Vliegenthart (2008), indicating that after the shocks, the series tends to move back to equilibrium, and shocks are corrected for. So a conflict, for instance, in week 20 may decrease the use of the managerial list that week, but its effect on the value of the next week’s emails (so this decreasing effect) would be reversed. In addition, the moving average models indicate a certain stability of the variables over time, as the influence of external “disturbances” is only in the short term. In this sense, the use of the two lists and their attachments in DELTA follow rather stable patterns in time. The managerial list did require a log transformation, which means that the underlying process of the variable was not stationary in its variance, and the higher the mean of the variable, the higher its variance (McCleary et al., 1980: 52). However, the process underlying all other media variables was stationary. In ERICOM, the patterns of memory of the variables were different (Table 2). For the attachments of the general list an ARIMA (1,0,1) was fit with θ = 0.7 and φ=0.9 with a constant. For the general list an ARIMA mode (0,1,1) with θ=0.7 without a constant was fit. For the downloading activity from the internal forum an ARIMA (0,1,1) was fit with natural log transformation with θ= 0.6 without a constant. For the management list an ARIMA (0,1,1) was fitted with natural log transformation with θ=0.92 without a constant. Finally, for the attachments of the managerial list an ARIMA (1,0,1) was fit with θ=0.8 and φ=0.9 with a constant. The coefficients in the moving average models are all positive, indicating that the influence of the disturbance of the previous time lag is positive. So, in the (0,1,1) models, a shock in week t-1 influences the change scores in week t: so a larger than expected
16
change in week t-1 results in greater than expected change in week t, and the influence of the shock is not corrected for. On the contrary, there is a positive feedback look of the previous shock. This is the opposite with DELTA, where the coefficients were negative, and the influence of a shock at each week was corrected for the following week.
ERICOM General list differencing)
(after
General list attachments
Logged management list (after differencing) Management attachments
list
Logged Internal Forum (after differencing)
Estimates
Std. Error
t
Approx. Sig.
MA1
0.698
0.056
12.551
0.000
Constant
-0.005
0.094
-0.052
0.959
AR1
0.894
0.070
12.816
0.000
MA1
0.705
0.111
6.363
0.000
Constant
1.252
0.472
2.652
0.009
MA1
0.922
0.032
28.808
0.000
Constant
0.025
0.025
0.992
0.323
AR1
0.898
0.095
9.431
0.000
MA1
0.799
0.131
6.080
0.000
Constant
0.564
0.206
2.739
0.007
MA1
0.624
0.077
8.118
0.000
Constant
-0.020
0.152
-0.133
0.895
Table 2: Univariate specifications ARIMA in ERICOM
The two lists, and the internal forum in ERICOM, are described by an integrated process. A random shock, such as e.g. conflict in the team, had an accumulated effect on the number of emails sent through the general list, influencing it for all following weeks. In general, all variables show permanent effect of random shocks (integrated models). This was very similar to the models for the attachments of the two lists, which were ARIMA (1,0,1) with φ coefficient very close to 1 (0.9). This again indicates long-term memory of the random shocks. It can be argued that this is related to the fact that in ERICOM media use did not stabilize over time. As described in the previous section, the role of media changed throughout time, which was not the case with DELTA. This change in the role of media over time characterizes the lack of stabilization of communication patterns, and that is why these patterns were influenced largely by shocks.
17
But why was the media use in ERICOM more sensitive to external events, than DELTA? It could be the result of uncertainty about the course of the project, since the project received very negative reviews. It could also be a result of uncertainty at the substantive level, because of the innovative character of the project. We need to keep in mind that ERICOM was more heterogeneous than DELTA in its members and exhibited higher complexity, as all research stages ran in parallel (gathering of data, development of indicators and development of software). The constant insecurity possibly resulted in an inability of ERICOM to function as a team, to coordinate and manage its activities and to create a stable communication pattern. Therefore, the environment of the team became more dominant in influencing media use. This is reflected by the integrated component (0,1,0) in the ARIMA models of the media variables in ERICOM [or by the (1,0,1) models with φ coefficient very close to 1]. In this respect, the DELTA team managed to coordinate itself, and therefore, media use stabilized over time and was not that sensitive to external events. DELTA managed to create and reproduce a communication pattern over time, which was not the case for ERICOM.
5. Discussion The aim of the paper was to investigate whether we can use ICT-based data to understand the dynamics of research collaboration. The analysis of the two cases suggested that different components of time series analysis of ICT-data related to different processes of collaboration. First, the linear trends that some media variables exhibited over time indicated the overall orientation and the priorities of each team: towards coordination and management, or towards substantive communication and intellectual work. Second, it was suggested that the extent to which a variable had memory (ACF function) is related to the degree of flexibility in the use of the media. If the use of a medium is regulated (who would use it, when etc) then its use at time t would probably not depend on its use at time t-1, but on the regulations of its use. So, the medium whose frequency showed lack of memory was the medium with a formalised role and regulated function and prescribed use.
18
Third, the different ARIMA models were related to the degree of stabilization of communication and collaboration patterns in a research team over time. Stationary models, and especially moving average models with a limited short-term memory (in DELTA) indicated the stability of coordination of a team over time; the team managed to reproduce its communication and collaboration patterns over time. In contrast, integrated models in ERICOM, with log transformations, indicated the change of media use over time, which was discussed in the description of the case study. The fact that the media variables in ERICOM did not stabilise with time was also related to the lack of stabilisation of collaboration patterns: the process of decisions changing over time, inability to coordinate tasks, lack of consistent management. Moreover, the ARIMA models reflect that external events (random shocks) become prevalent in defining the communication patterns of the team, because of this lack of stabilisation. These insights constitute a novel theoretical contribution to our understanding of the patterns that underlie the dynamic use of media over time, a topic hardly studied or theorised in the context of research collaboration. The paper showed that new developments in research collaboration and its dynamics can be captured with the use of quantitative systematic indicators, using ICT-based data. It is a fruitful way forward, in order to understand the variations between media use, and the underlying patterns which they follow, as their use evolves in a system. In this sense, we can understand research teams as complex systems emerging from their communication patterns, and the stabilisation of these patterns over time can be understood as a success of the research team to organize itself: to coordinate its tasks and develop stable collaboration patterns. In this respect, ERICOM showed higher complexity than DELTA, with a lack of stabilisation of its communications. These processes can be studied with ARIMA analysis of the communication patterns of the teams, especially with the use of ICT-based data. The communication patterns under study here were the team-level communications. As Arrow et al. (2000) point out, in small groups as complex systems, there are three levels of dynamics that shape the research team: the local, the global and the contextual dynamics. Local dynamics refer to the activities of the individual researchers and local
19
institutes. Local dynamics give rise to global (team-level) dynamics that are shaped by them. Global dynamics refer to team-level variables (e.g. team communications). Contextual dynamics refer to the impact of features in the contexts in which the team belongs to, which shape and constrain the local and global dynamics (Arrow et al., 2000; p. 40). For instance, a change in the Commission’s funding mechanism may influence the dynamics of the research teams under study. This distinction helps point out possibilities for further research. The study was limited to the dynamics at the global level, and more precisely, on global-level communications. Personal emails and communication can reveal the local dynamics, and also how the local dynamics give rise to the global dynamics. Contextual dynamics can reveal how the global dynamics are shaped and constrained. Both local and global dynamics can better illuminate why and how the emergent global dynamics emerge. This was not done in the current paper. A study of how all three levels relate to each other is recommended for further research, as it can contribute to a more complete understanding of distributed teams. The aim of this paper was to propose a specific methodology, which can help us examine in a systematic and quantifiable way developments at the researching level. Avoiding the problems of qualitative analysis of case studies, which cannot be systematically compared, but also results of self-reported methods, ARIMA models of informal communication of research teams can be used to understand the stabilisation of research groups as complex systems. The increasing use of ICTs in research provides a wealth of systematic data as a precise record of the behaviour that was not traceable before (e.g. Prabowo & Thelwall, 2008). They can relate, as shown here, to collaboration dynamics, such as how the team manages its tasks, and how it coordinates itself. Real-time electronic data constitute important systematic and quantifiable information for our understanding of how new modes of collaboration function and how research is conducted, how distributed teams organise themselves, and whether collaboration patterns such as coordination and
20
decision-making activities stabilise. Of course, distributed teams differ as to the degree to which they use different ICTs. Here, for instance, ERICOM used ICTs less and meetings more than in DELTA, as pointed in the description of the two teams. However, this does not invalidate the results. The different ARIMA components are not sensitive to the level (mean) of the variable. If a variable is described by an ARIMA (0,0,1) model with a negative θ coefficient, this indicates that random shocks have a short-term impact, and the variable has one equilibrium to which it returns, whether its mean value is 5 or 20. What matters is whether that mean is stable in time (stationary). There are several limitations of this exploratory paper. First, the analysis restricted itself to two case studies of very specific structure: FP5 project collaborations. Even though the aim of this paper was not the generalization of results for all collaborative endeavours, the distributed character of these collaborations did play a role in the findings. The use of ICTs in distributed collaborations is essential, in contrast to, e.g. traditional forms of collaboration between two scientists in the same department. Moreover, the discipline was not taken into account here, because the teams were interdisciplinary. However, the use of ICTs is distinctive in different disciplines (Heimeriks et al., 2008). Further research could take into account different forms of distributed collaborations (e.g. in the natural sciences), which may bring out different patterns of collaboration, and as a result, different ARIMA models of ICT use. In conclusion, the analysis in this paper can be understood as exploring a systematic indicator of the stabilisation of the communication patterns of a system in time. Rather than restricting the use of ARIMA in studies at the researching level, developments in the scientizing and politicking level can also be understood similarly. When does European collaboration stop being symptomatic and funding-driven and emerge as a stable system? How does a new discipline emerge as a stable configuration of its communication system? These are the types of questions that can be answered with the use ARIMA analysis as indicated in this paper.
21
Acknowledgements: This paper is based on the PhD dissertation of the author and has benefited from comments by Rens Vliegenthart, Gaston Heimeriks, Stuart Blume, and suggestions from two anonymous reviewers. Further, Nigel Gilbert, Peter van den Besselaar, Isidro Aguillo, Manolis Mavrikakis, and Alexander Schouten have assisted with collecting and manoeuvring the data. Research underlying this paper was funded by the European Commission (IST-FP5) and the Amsterdam School for Social Sciences Research. The members of both teams explicitly agreed to the observation of their communications and collaboration patterns at the time of the projects. Further, the coordinators of both teams granted explicit access to the public material for research purposes. No archive of private emails was used for this paper.
22
References Arrow H., McGrath J. E. and Berdahl J. L. (2000). Small groups as complex systems; Formation, Coordination, Development and Adaptation. Thousand Oaks: Sage. Atkinson, P., Batchelor, C., Parsons, E. (1998). Trajectories of Collaboration and Competition in a Medical Discovery. Science, Technology and Human Values, 23(3), 259-284. Beaver, D. D. (2001). Feature report: Reflections on Scientific Collaboration (and its study). Scientometrics, 52(3), 365-377. Birnholtz, J. P. (2005). When do researchers collaborate? Toward a model of collaboration propensity in science and engineering research. Unpublished PhD thesis, University of Michigan. Boomgarden, H. J. & Vliegenthart, R. (2007). Explaining the rise of anti-immigrant parties: The role of news media content. Electoral studies, 26, 404-417. Bozeman, B. & Corley, E. (2004). Scientists’ collaboration strategies: Implications for Scientific and Technical Human Capital. Research Policy 33(4), 599-616. Caldas, A. (2003). Are newsgroups extending ‘invisible colleges’ into the digital infrastructure of science?. Economics Innovation and New Technologies, 12(1), 4360. Cronin, B. (2001). Hyperauthorship: A Postmodern Perversion or Evidence of a Structural Shift in Scholarly Communication Practices?. Journal of the American Society for Information Science and Technology, 52(7), 558-569. Cronin, B. (1995). The scholar’s courtesy; The role of acknowledgements in primary communication process. London: Taylor Graham. Garvey, W. D. (1979). Communication: The Essence of Science. Pergamon Press: Oxford Gläser, J. (2003). What Internet Use Does and Does not Change in Scientific Communities. Science Studies, 16 (1), 38-51. Giles C.L. and Councill I. G. (2004). Who gets acknowledged: Measuring scientific contributions through automatic acknowledgment indexing. Proceedings of the National Academy of Sciences of the United States of America, 101(51), 1759917604.
23
Gottman, J. M. (1981). Time-series analysis; A comprehensive introduction for social scientists. Cambridge: Cambridge University Press. Hackett, E. J. (2005). Introduction; Special Guest-Edited Issue on Scientific Collaboration. Social Studies of Science, 35(5), 667-671. Hara, N., Solomon, P., Kim, S., Sonnenwald, D. H. (2003). An emerging view of Scientific Collaboration. Journal of the American Society for Information Science, 54(10), 952-965. Heimeriks, G. & Vasileiadou, E. (2008). Changes or Transition? Analysing the use of ICTs in sciences. Social Science Information, 47(1), 5-29. Heimeriks, G., van den Besselaar, P. & Frenken, K (2008). Digital Disciplinary Differences: An analysis of computer mediated science and “Mode 2” knowledge production. Research Policy, 37(9), 1602-1615. Hollanders D. & Vliegenthart R. (2008). Telling what yesterday’s news might be tomorrow: Modeling media dynamics. Communications 33, 47-68. Jeffrey, P. (2003). Smoothing the Waters: Observations on the Process of CrossDisciplinary Research Collaboration. Social Studies of Science, 33(4), 539-562. Katz, J. S. & Martin, B. R. (1997). What is research collaboration. Research Policy, 26(1), 1-18. Laudel, G. (2002). Collaboration and Reward: What do we measure by co-authorships?. Research Evaluation, 11(1), 3-15. Lee, S. & Bozemann, B. (2005). The Impact of Research Collaboration on Scientific Productivity. Social Studies of Science, 35(5), 673-702. Matzat, U. (2004). Academic communication and IDGs. Social Networks, 26(3), 221255. McCleary, R., H Jr., R. A., Meidinger, E. E., McDowall, D. (1980). Applied time series analysis for the social sciences. London: Sage. Nentwich, M. (2003). Cyberscience; Research in the Age of the Internet. Vienna: Austrian Academy of Sciences Press. Prabowo, R. & Thelwall, M. (2008). Finding and tracking subjects within an ongoing debate. Journal of Informetrics 2(2), 107-127.
24
Rip, A. (1990). An Exercise in Foresight: The Research System in Transition. In S. E. Cozzens (Ed.), The Research System in Transition. (pp. 387-401). Dordrecht: Kluwer. Romer D. (2006). Time Series Models. In D. Romer, K. Kenski, K. Winneg, C. Adasiewicz & K. Hall Jamieson (Eds.), Capturing Campaign Dynamics, 2000 and 2004; The National Annenberg Election Survey. (pp. 165-243). Philadelphia: University of Pennsylvania Press. Schunn, C., Crowley, K., Okada, T. (2002). What Makes Collaborations Across a Distance Succeed? The Case of the Cognitive Science Community. In P. Hinds & S. Kiesler (Eds.), Distributed Work. (pp. 407-430). Cambridge: The MIT Press. Shrum, W., Genuth J., Chompalov I. (2007). Structures of Scientific Collaboration. Cambridge: The MIT Press. Sonnenwald, D.H. (2007). Scientific collaboration. In B. Cronin (Ed.), Annual Review of Information Science and Technology: Vol. 41. (pp. 643-681). Medford NJ: Information Today. SPSS Inc. (1987). Trends™ SPSS/PC+™ For the IBM PC/XT/AT. Chicago: SPSS Inc. Vasileiadou, E. & van den Besselaar, P. (2004). One method fits all? Studying a research team using different data and methods. Presented in the Symposium New Research for New Media “Innovative Research Methodologies”, Terragona. Walsh, J. P. & Maloney, N. G. (2002). Computer Network Use, Collaboration Structures and Productivity. In:P. Hinds & S. Kiesler (Eds.), Distributed Work. (pp. 433-458). Cambridge: The MIT Press. Warner, R. M. (1998). Spectral Analysis of Time-Series Data. New York: The Guilford Press. Yin, R. K. (2003). Case Study Research; Design and Methods, Applied Social Research Methods Series Volume 5. Thousand Oaks: Sage Publications (3rd edition).
25