Embedded information structures and functions of co ... - Springer Link

1 downloads 18 Views 1MB Size Report
May 7, 2014 - Keywords Cancer research Б Collaboration network Б Network analysis ... Council of Scientific and Industrial Research – National Institute of ...
Scientometrics (2015) 102:285–306 DOI 10.1007/s11192-014-1310-y

Embedded information structures and functions of co-authorship networks: evidence from cancer research collaboration in India Avinash Kshitij • Jaideep Ghosh • Brij Mohan Gupta

Received: 3 January 2014 / Published online: 7 May 2014 Ó Akade´miai Kiado´, Budapest, Hungary 2014

Abstract In this exploratory study, we analyze co-authorship networks of collaborative cancer research in India. The complete network is constructed from bibliometric data on published scholarly articles indexed in two well-known electronic databases covering two 6-year windows from 2000 to 2005 and 2006 to 2011 inclusive. Employing a number of important metrics pertaining to the underlying topological structures of the network, we discusses implications for effective policies to enhance knowledge generation and sharing in cancer research in the country. With some modifications, our methods can be applied without difficulty to examine policy structure of related disciplines in other countries of the world. Keywords

Cancer research  Collaboration network  Network analysis

Introduction In academic and industrial research, different patterns of collaborative activities among researchers have been observed in theoretical, empirical, phenomenological, and experimental studies. More commonly, however, scholarly articles published in peer-reviewed

A. Kshitij (&) Centre for Studies in Science Policy, School of Social Sciences, Jawaharlal Nehru University, New Delhi 110067, India e-mail: [email protected] A. Kshitij  J. Ghosh  B. M. Gupta Council of Scientific and Industrial Research – National Institute of Science, Technology and Development Studies (CSIR – NISTADS), K. S. Krishnan Marg, PUSA Gate, New Delhi 110012, India e-mail: [email protected] B. M. Gupta e-mail: [email protected]

123

286

Scientometrics (2015) 102:285–306

journals and well-known conference proceedings have appeared in joint– or multiple– authorship, revealing patterns of characteristic associations between researchers. In this regard, one may conceive of affiliations of researchers to the collaborative works, culminating in the eventual publication of scholarly articles. These are built on the common appearance of the researchers in collaborative activities, in which the affiliation binds researchers in true socio-professional bonds (Baraba´si et al. 2002; Newman 2001a). Affiliation network data can be obtained reasonably accurately, because group-membership in the form of co-authorship is a characteristic that can be established with a fairly high degree of precision. A research affiliation network is based on the set of socioprofessional ties between researchers, which is made manifest by the publication of scholarly articles co-authored by them in refereed journals and conference proceedings. The researchers, by dint of the strength of an established tie of contributory involvement in research, are likely to be truly acquainted with one another. In this sense, therefore, a professional research collaboration network embodies truly social interactions between the researchers (Newman 2001a). Of course, there has appeared from time to time, collaborative research in certain disciplines in the natural sciences involving a very large number of collaborators, many of whom may not actually be acquainted with one another. Whether such large collaboration groups involve truly social interaction between researchers is a debatable point (Shrum et al. 2007). In the present study, we do not encounter any instances of such collaborations. A network study to understand the patterns of collaboration in cancer research has important implications for research policy. Among other things, this helps to improve the present state of research regarding funding, institutional support and incentives, as well as professional encouragement for enhanced collaborative activities (Bozeman and Corley 2004; Eckhouse et al. 2008; Lewison et al. 2010). By simply examining the time series of output production in research, it is not possible to identify some research areas where researchers are performing exceptionally well, and other areas that require incentives for enhanced productivity. By contrast, many interesting characteristics of collaborative research can be visualised by examining the patterns of collaboration by using the technique of social network analysis (Acedo et al. 2006; Baraba´si et al. 2002; Bozeman and Corley 2004; Melin and Persson 1996; Newman 2001a, b). For example, hierarchical networks are more susceptible to the breakdown of information flows than highly cohesive networks with many redundant paths providing channels for information flow. Also, if a complete research network is fragmented into several disconnected sub-networks, many researchers will be unable to establish ties with others. On the other hand, if the complete network contains bridges of mediating ties among its sub-networks, flows may be more effective, resulting in a higher rate of innovative production (Burt 2004; Granovetter 1973). Our present analysis in this regard helps to identify some impact issues for research policy in India. Similar considerations may be appropriate in related areas of research in other countries as well.

The present scenario in india and the research question Nowadays, cancer is one of the major life-threatening diseases around the world. It causes numerous deaths in increasingly larger proportions in a developing country like India. However, major advancement in fundamental research and technologies has also been made in biology and biomedicine through the sequencing of the human genome and by the identification of various causes of cancer. Research in this area is currently of very high

123

Scientometrics (2015) 102:285–306

287

22500

Cumulative total number of papers

20000 17500 15000 12500 10000 7500 5000 2500 0 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011

Year Fig. 1 Cumulative publication output in cancer research in India

priority in the country. Several agencies, such as the Indian Council of Medical Research, the Council of Scientific and Industrial Research, the Department of Science and Technology of the Government of India, the Department of Biotechnology of the Government of India, etc. have emerged as the primary sources of funding for cancer research in the country. About 3,600 different institutes are participating in this research in India over the period of 2000–2011. Approximately 21 % of the total research output is from different departments (Chemistry, Biology, Zoology, Biotechnology, Biochemistry, etc.) of the Indian universities, and 42 % of the output is from various medical colleges and hospitals. The rest of the total output is contributed by individual research institutes (for example, Regional Cancer Centre, Thiruvananthapuram; Central Drug Research Institute, Lucknow; Indian Institute of Chemical Biology, Kolkata; Indian Institute of Chemical Technology, Hyderabad; Centre for Cellular and Molecular Biology, Hyderabad; Institute of Genomics & Integrative Biology, New Delhi; Indian Institute of Science, Bangalore; Bhabha Atomic Research Centre, Mumbai; Bose Institute, Kolkata; Central Institute of Medicinal and Aromatic Plants, Lucknow; Chittaranjan National Cancer Institute, Kolkata; Gujarat Cancer & Research Institute, Ahmedabad; etc.) and pharmaceutical companies (such as, Ranbaxy Laboratories Limitedp; Dr.Reddy’s Laboratories Limited; Piramal Life Sciences Limited; etc.). Figure 1 exhibits the cumulative growth of cancer research productivity, measured in term of the volume of publication, in India obtained from the Web-of-Science database over the period 2000–2011. The average annual growth rate of productivity is about 18 %. Figure 2 shows a statewise distribution of the top-20 productive cancer research institutes in India. About 32 % of the total research output comes from top-20 research institutions in India. Out of these, as Fig. 2 shows, there are four institutes in Delhi and in Utter Pradesh, three in Tamil Nadu and in West Bengal, two in Maharashtra and in Chandigarh, and one in Kerala, Karnataka and Andhra Pradesh. Figure 3 shows that the top-ranked institutions

123

288

Scientometrics (2015) 102:285–306

Fraction of cancer research institutes

0.2

0.15

0.1

0.05

0 Andhra Pradesh

Chandigarh

Delhi

Karnataka

Kerala

Maharashtra Tamil Nadu Uttar Pradesh West Bengal

State

Fig. 2 Statewise distribution of top-20 productive cancer research institutes in India

0.07

Proportion of total publication

0.06 0.05 0.04 0.03 0.02 0.01 0 0

500

1000

1500

2000

2500

3000

3500

4000

Institute ranking (publication based) Fig. 3 Institutional proportion of publication and their ranking

in cancer research contributed to 6.5 % of the total national output in research in this area. However, very few national institutions have produced more than 1 % of the total publication volume. Figure 4 shows the statewise proportions of multi-institute collaborative and single-institute papers. Only a very few states (for example, Andhra Pradesh, Gujarat, Jammu and Kashmir, Rajasthan, and West Bengal) have high proportions of multi-institute collaborative research publications.

123

Scientometrics (2015) 102:285–306

289

0.12 Proportion of single-institute paper Proportion of multi-institute paper

Proportion of papers

0.1 0.08 0.06 0.04 0.02

A&N Andhra Pradesh Arunachal Prades Assam Bihar Chandigarh Chhattisgarh Delhi Goa Gujarat Haryana Himachal Prades Jammu & Kashmir Jharkhand Karnataka Kerala Madhya Pradesh Maharashtra Manipur Meghalaya Mizoram Nagaland Orissa Pondicherry Punjab Rajasthan Sikkim Tamil Nadu Tripura Uttar Pradesh Uttarakhand West Bengal

0

State Fig. 4 Statewise share of multi-institute and single-institute publication

The above results provide sufficiently convincing evidence that India has been growing as a very active centre of cancer research over the years. However, the actual underlying structure of collaborative cancer research in the country is not apparent from a simple examination of research output and productivity. Many questions remain unanswered. For example, who are the highly connected researchers? How dense is the research network? What are the distributions of individual productivities? Is there large-scale connectivity in the network? Does the researchers posses the small-world property (Milgram 1967), whereby new research ideas and information flow quickly through the network? What is the extent of the overall cohesiveness of the network? How fragmented is the network? Is the network hierarchical in topology, and if so, is there a tradeoff between cohesion and hierarchy? Each of these questions has important implications for research policy in the broad area of cancer research in India. Formulating appropriate policies in the various areas of specialization and sub-disciplines of cancer research is necessary to ensure that the right environment exists to promote a vibrant cancer research community in the country. In the followings pages of this paper, we will examine these issues.

Research methodology Many important constructs from mathematical graph theory are used in social network analysis (Scott 2000; Wasserman and Faust 1994). In this work we construct the complete cancer research collaboration network using bibliometric data retrived from Elsevier’s SciVerse Scopus and Thomson Reuters’ web of science (WoS) electronic databases. The search criteria employed to retrieve the data consist of the following keywords related to cancer research: ‘‘cancer, carcinoma, leukemia, oncology, tumor, neoplasm, colorectal cancer, ovarian cancer, prostate cancer, adenocarcinoma, androgen receptor, apoptosis,

123

290

Scientometrics (2015) 102:285–306

aneuploidy, atherosclerosis, glioblastoma, lymphoma, melanoma, metastasis, oncogenes, tumor suppressor genes’’. These keywords are used to perform the search, which includes the title, the keywords in the article, as well as the abstract. In addition, all document types, such as articles, conference papers, reviews, letters, research notes, editorials, book chapters, abstract reports, etc. are recorded for processing and analysis. A few of the network metrics examined in this work are computed using the Pajek software package (Batagelj et al. 2012). Others are computed by employing fast numerical routines.

The networks The primary units of our analysis are cancer researchers in India conceived of as nodes in a co-authorship network. The connections between the nodes signify the existence of coauthored papers characterizing the edges between the nodes in the network. Mathematically, the network is represented by a collection of nodes and edges in the form of a set of pairs as follows: G ¼ fV; Eg, where V is a set of n nodes fV1 ; V2 ; . . .; Vn g, and E is a set of m edges fE1 ; E2 ; . . .; Em g that link two elements of V. A collaboration typically models a mutual bond between researchers. In this regard, any directional specificity of the bond is irrelevant; it is sufficient to know if collaborative interactions exist among the researchers leading to the publication of scholarly articles. Our network is therefore undirected. Mathematically, if V is the set of researchers, their collaboration co-authorship network is characterized by a symmetric n  n adjacency matrix, whose elements contain the number of papers co-authored by the researchers. A network of this type is a weighed network. In this work, we consider its simpler version in which the adjacency matrix contains only binary elements 0 and 1, with 0 signifying the absence and 1 the presence of a paper.

Author name resolution algorithm The database sizes of Scopus and WoS are different, and the data extracted for the present study vary in total size as well. In each of the datasets used to construct the network, it is not possible to determine the exact number of authors. This is primarily due to the fact that an author’s name may appear differently in different papers. Furthermore, in many instances, two authors may have the same name. To distinguish names, additional resolution through home institutions, fields of specialization or disciplines is not foolproof either, since researchers may publish papers from more than one institution as well as in multiple disciplines or specialized areas (Newman 2001a). In order to account for multiple name variations, we employ an algorithm in which two distinct network versions are constructed (Newman 2001a). In the first version (FI), an author is identified by his or her surname and only the first initial. This method introduces an error in the estimate by identifying two people as one; however, it hardly fails to identify two names that truly belong to the same person. In the second version (AI), an author is identified by surname and all the initials. This method makes it possible to distinguish names from one another but introduces an error in the estimate by identifying one person as two if their initials are specified differently in different papers, overestimating the number of authors in a database. Numbers of authors obtained in these two different versions are used to fix the upper and lower bounds of an interval [FI, AI] that

123

Scientometrics (2015) 102:285–306

291

contains the actual, albeit practically unobservable, number of authors in the collaboration networks. Similar algorithms have also been used elsewhere (Milojevic´ 2013).

Metrics and distributions Paper and author counts The database sizes of Scopus and WoS are different, because of the extent of their overall coverage. The data extracted from these databases for use in the present study are different in size. Author names have been resolved by employing the above-mentioned nameresolution algorithm. An interval of author names is created in the form [FI, AI], where FI (first initial) is the lower-bound and AI (all initials) is the upper bound. Two relevant distributions that we examine in this work are 1) papers per author and 2) authors per paper.

Co-author distribution In our networks, the degree (that is, the number of direct collaborators of a researcher) is a centrality metric, in that a researcher with high degree is connected to many others in the network and therefore may be of some importance in a socio-professional context. In this regard, we examine if our networks are scale-free. Some collaboration co-authorship networks in the real world have been found to be scale-free (Baraba´si et al. 2002), but there are some exceptions as well (Newman 2001a).

Density and centralization Density ðqÞ is a measure of how well connected a network is. It is operationalized by the ratio of the number of edges in the network to the maximum number of possible edges (Wasserman and Faust 1994). The network’s centralization (r) measures the extent to which collaborative ties are bound to central actors in the network. It has a high value when the network structure is bipolar, meaning divided into a few central researchers and the remaining majority of peripheral researchers. An integrated examination of degree, density, centralization, and cohesion provides information about the presence of holes in the network.1

Small world Small world is a situation in which most researchers are not direct collaborators of one another but connected to other researchers by a small number of intermediate neighbours (Milgram 1967). Globally, the average distance l between pairs of actors scales as logðnÞ. Because of this form of scaling, pairs tend be connected by short paths through the network (Baraba´si et al. 2002; Newman 2001a, b; Watts and Strogatz 1998). Thus, research-related 1

The holes are more accurately identified by considering densities and constraints in ego networks of the individual researchers in the complete network.

123

292

Scientometrics (2015) 102:285–306

information does not have to travel far and wide through a small-world collaboration network. A useful comparison benchmark to use in this regard is a random network. For such a network consisting of n nodes with a mean degree of h, Watts and Strogatz (1998) have shown that the spread scales as lrand  lnðnÞ lnðhÞ. Based on earlier evidence, we perform comparisons between the average distances in the cancer research collaboration network and those in a random benchmark and hypothesize that our network possesses the smallworld property. Percolation A giant cluster in a collaboration network is a large subset of researchers that are all connected to one another through ties with other intermediate researchers. In the absence of such a cluster, researchers are unable to collaborate beyond their local groups. When a new researcher joins the network, there is an overwhelmingly large probability that they will be connected to the giant cluster and not to one of the small clusters in the network. The sizes of the other clusters are small and are typically independent of the number of researchers in the network. In this regard, we compute an order parameter (c) in the form of a ratio of the size of the largest cluster and that of the complete network. Empirically, a value greater than or equal to 65 % is commonly used as a percolation threshold in the collaboration networks of many disciplines (Newman 2001a). Cohesion and clustering A socio-professional collaboration network should exhibit clustering (Albert and Baraba´si 2002; Newman 2001b). This implies that there exist tightly coupled groups of researchers exhibiting high internal collaborative ties and low external ones. In this work, we are not concerned with local, ego-centered clustering but only with cohesion that is exhibited at the complete network level. This large-scale cohesion is captured by a transitivity metric that characterizes the symmetry of interactions among transitive triads (Newman 2003). By contrast, a random network should exhibit weak transitivity. The fraction of transitive triads in the complete network is the clustering coefficient (CC) (Watts and Strogatz 1998). This is actually the probability that any two randomly selected neighboring researchers of a focal researcher are themselves collaborating neighbors. Note that the density metric introduced above captures only the density of the complete network; it is a marginally global measure. By contrast, global cohesion is characterized by the clustering of the complete network measured in terms of the maximally cohesive local triads. Thus, a globally sparse network may yet have local sections of high clustering. A random network of the same size is used for benchmarking cohesion of the collaboration co-authorship networks in this study. The clustering probability in the random network is equal to the probability that any two randomly selected researchers are themselves co-authors of papers. Thus, in this case, CC is given by CC rand ¼ h=n, where h is the average collaboration degree, and n is the total number of researchers in the network. The CC for most collaboration networks studied so far does not seem to decrease with n for constant h but remains largely independent of n (Albert and Baraba´si 2002; Newman 2001a, b; Watts and Strogatz 1998). In the case of clustering in a random network, CC rand  n1 , where n is the number of nodes in the network. Naturally, therefore, the value of CC in such a network becomes exceedingly small in the limit of large network size (Watts and

123

Scientometrics (2015) 102:285–306

293

Strogatz 1998). The situation will usually be different in a co-authorship network. Here, interconnected social ties tend to increase the value of CC, sometimes, by a considerable margin (Newman 2001a). This happens due to an enhanced probability of two actors, who are connected through a common friend or acquaintance, to be themselves acquainted with each other. Core Technically, a core reveals whether the high-degree nodes in a network are closely clustered or dispersed all over the network. In this regard, it is important to consider the degrees of all nodes within the cluster. Thus, a core with a certain maximum and minimum degree identifies relatively dense sub-networks within the main network, which are the determinants of cohesive subgroups. In a k-core, one finds, therefore, the presence of nodes having at least k-neighboring nodes within the core. To identify the areas of cancer research in India, we limit our consideration to the largest connected (whether giant or not) of the main network.

Results Now we present the main results of this work, based on the co-authorship network structures of cancer researchers in India. Table 1 summarizes the metrics pertaining to the large-scale network topologies over two time windows T1: 2000–2005 and T2: 2006–2011 respectively. The Scopus data for AI show 21.5 % foreign participation for cancer research in India in period T1. Out of this, the US share is 7.8 %, the UK share is 1.6 %, Japan has a share of 1.5 %. In T2, 27.1 % of the total AI authors are from outside India, out of which 11.0 % are from the US, 1.6 % from UK, 1.35 % from Japan. The WoS data over the period 2008–2011 show 48.62 % foreign co-authors, out for which 9.5 % are from the US, 1.4 % from UK, 1.3 % from Germany. Data obtained from WoS before 2008 are not very clearly resolved for author names and institutional affiliations. The first entries in the tables are the AI values, and the second entries are the FI ones. The scaling and cutoff parameters of the collaborator distributions are computed for the AI networks only. In the following subsections, we illustrate the individual results in terms of AI. As and when appropriate, we point out the significances of the FI results as well. Papers and author count As seen in Table 1, for the networks constructed from both the databases, the number of research papers indexed in Scopus nearly doubled in T2. In WoS the number increased by twelve times in this period. Besides, the numbers of cancer researchers more than doubled in both Scopus and WoS in T2.

Papers per author In both T1 and T2, the average number of papers per author is close for both Scopus and WoS. Also, note that the FI values are larger than the AI ones. This is normal, since the FI

123

294

Scientometrics (2015) 102:285–306

Table 1 Statistics of metrics from the cancer research networks in India Quantity

Scopus

WoS

Time period T1

T2

T1

T2 9,208

Total number of papers

8,108

16,002

765

Total number of authors

15,608

37,567

10,601

28,557

FI

13,198

31,125

9,129

23,640

Average papers per author

2.370 (4.649)

2.698 (7.489)

2.261 (3.807)

2.721 (6.776)

FI

2.798 (6.436)

3.252 (10.421)

2.620 (5.126)

3.281 (9.489)

Average authors per paper

4.553 (2.857)

6.336 (5.007)

2.200 (0.928)

2.436 (1.003)

FI

4.553 (2.857)

6.325 (4.996)

2.197 (0.930)

2.431 (1.003)

Collaborators per author

10.521 (17.293)

21.872 (50.287)

13.795 (32.327)

15.053 (32.931)

FI

12.151 (22.266)

25.813 (64.829)

15.715 (35.508)

17.624 (38.966)

Exponent (AI)

1.534

1.597

1.687

1.598

Cutoff (AI)

3218.2

17,869

4,781

9304.5

q

6.74*10-4

5.82*10-4

1.30*10-3

5.27*10-4

-4

-4

-3

7.46*10-4

FI

9.21*10

8.29*10

1.72*10

r

0.977

0.993

0.952

0.984

FI

0.98

0.991

0.952

0.982

Average distance

4.309

3.607

4.838

5.258

lrand

4.103

3.414

3.532

3.784

FI

3.884

3.338

4.208

3.857

lrand

3.799

3.182

3.311

3.509

Giant cluster

13,453

36,042

8,790

25,493

FI

12,065

30,296

8,206

22,306

c

86.193

95.940

82.901

89.274

FI

91.415

97.352

89.889

94.361

CC

3.572*10-1

3.290*10-1

8.907*10-1

6.519*10-1

-4

-4

-3

5.271*10-4

-1

5.610*10-1

-3

7.463*10-4

CCrand FI CCrand

6.741*10

-1

2.728*10

-4

9.207*10

5.819*10

-1

2.646*10

-4

8.293*10

1.301*10 8.467*10 1.721*10

algorithm is expected to identify two authors as one in many cases, although it does not usually confuse between two authors who are really the same individual. Figures 5, 6, 7, 8 display, on double logarithmic scales (the log binning is used to reduce noise in the data), the distributions functions P(k) of k or larger number of papers per author, where the exponents are obtained from the best regression power-law fits in the tails of the distributions. The Scopus exponents in T1 are 2.38, and the corresponding WoS exponent is 2.48. Correspondingly, in T2, the Scopus exponent is close to 2.25 and the corresponding WoS exponent is 2.25. These values are of the order of Lotka’s estimates (Lotka 1926). This indicates that we have distributions in which the majority of researchers

123

Scientometrics (2015) 102:285–306

295

1.00E+05

Exponent = 2.376

Number of Author

1.00E+04

1.00E+03

1.00E+02

1.00E+01

1.00E+00 1.00E+00

1.00E+01

1.00E+02

1.00E+03

Number of Papers Fig. 5 Paper per author (Scopus) T1

1.00E+05 Exponent = 2.247

Number of Authors

1.00E+04

1.00E+03

1.00E+02

1.00E+01

1.00E+00 1.00E+00

1.00E+01

1.00E+02

1.00E+03

Number of Papers Fig. 6 Paper per author (Scopus) T2

publish a small number of papers, but there are also those who publish a large number of papers, causing the distribution to extend its tail to large values of k. Of course, since the data cutoff is just a five-year period in our study, researchers are not expected to publish a very large number of papers over this time horizon, and the tails of the distributions are cut off at a finite k. If we had used a longer period, then we would perhaps have noticed an even larger number of papers published by a few researchers, resulting in distributions that possessed even longer tails.

123

296

Scientometrics (2015) 102:285–306 1.00E+05

Number of authors

1.00E+04

Exponent = 2.476

1.00E+03

1.00E+02

1.00E+01

1.00E+00 1.00E+00

1.00E+01

1.00E+02

Number of papers Fig. 7 Paper per author (WoS) T1

1.00E+05

Number of authors

1.00E+04

1.00E+03 Exponent = 2.249

1.00E+02

1.00E+01

1.00E+00 1.00E+00

1.00E+01

1.00E+02

1.00E+03

Number of papers Fig. 8 Paper per author (WoS) T2

Authors per paper Although some researchers have found a growing tendency toward publishing co-authored papers in research (Acedo et al. 2006; Leydesdorff and Wagner 2008), we find in both T1 and T2 the average number of authors per paper to be in the small range of 2–6 for both the Scopus and the WoS databases. The marginal variations in the numbers are largely due to the different coverage of cancer research papers within the Scopus and WoS databases. The distributions of the numbers of authors per paper are shown in Figs. 9, 10, 11, 12. In T1, the Scopus exponent is close to 2.27. The corresponding WoS values are about 2.37.

123

Scientometrics (2015) 102:285–306

297

1.00E+06

Number of papers

1.00E+05

Exponent = 2.265

1.00E+04

1.00E+03

1.00E+02

1.00E+01

1.00E+00 1.00E+00

1.00E+01

1.00E+02

Number of authors Fig. 9 Author per paper (Scopus) T1 1.00E+04

Exponent = 2.368

Number of papers

1.00E+03

1.00E+02

1.00E+01

1.00E+00 1.00E+00

1.00E+01

1.00E+02

1.00E+03

Number of authors Fig. 10 Author per paper (Scopus) T2

The T2 values are also shown in the figures. The power law is well resolved in the tails of the distributions. However, the distributions exhibit quite a bit of initial curvature, and power-law forms do not fit the data very well in this region. The curvature may have arisen from our finite time cutoff. The networks may also seem to have experienced growth over the years. The networks are constructed from a five-year window, with the assumption that they remain essentially static over this period, and the curvature may indicate a departure from this assumption. This point is not explored further in this work.

123

298

Scientometrics (2015) 102:285–306 1.00E+04

Exponent = 2.258

Number of papers

1.00E+03

1.00E+02

1.00E+01

1.00E+00 1.00E+00

1.00E+01

1.00E+02

1.00E+03

Number of authors

Fig. 11 Author per paper (WoS) T1 1.00E+05

Exponent = 2.658

Number of papers

1.00E+04

1.00E+03

1.00E+02

1.00E+01

1.00E+00 1.00E+00

1.00E+01

1.00E+02

1.00E+03

Number of authors

Fig. 12 Author per paper (WoS) T2

Collaborators per author In T1, the Scopus and WoS average collaborators per author (that is, the mean degrees) are approximately 10.52 and 13.80 respectively. The small residual variation in the results is attributable to their respective coverage differences (between the two databases). In T2, the corresponding values are approximately 21.87 and 15.05 respectively. The mean degrees are comparatively higher than those computed for networks in other disciplines (Baraba´si et al. 2002; Newman 2001a, b). The primary reason for this is that, the field is dominated

123

Scientometrics (2015) 102:285–306

299

1.00E+05

Number of authors

1.00E+04

1.00E+03

1.00E+02

1.00E+01

1.00E+00 1.00E+00

1.00E+01

1.00E+02

1.00E+03

Number of collaborators

Fig. 13 Collaborator per author (Scopus) T1 1.00E+06

Number of authors

1.00E+05

1.00E+04

1.00E+03

1.00E+02

1.00E+01

1.00E+00 1.00E+00

1.00E+01

1.00E+02

1.00E+03

1.00E+04

Number of collaborators

Fig. 14 Collaborator per author (Scopus) T2

by lab-based, experimental research involving higher manpower than employed in other disciplines that are driven by theoretical work (for example, mathematics, mathematical physics, theoretical high energy physics, etc.). The collaborator distributions are shown in Figs. 13, 14, 15, 16. All of these distributions are somewhat long-tailed; however, they also exhibit some curvature. They are therefore not perfect power laws; rather, they seem to follow truncated power laws of the k form pðkÞ  ka ekc , where a and kc are parameters. Similar effects have been noticed earlier (Baraba´si et al. 2002; Newman 2001a). Again, the fits are performed in the tails of the distributions, shown in the figures by lines on the log–log scales used. The power laws are all cut off at finite values of collaboration strength, proportional to the maximum degrees of the distributions. Similar results have also been reported earlier by Newman

123

300

Scientometrics (2015) 102:285–306 1.00E+05

Number of authors

1.00E+04

1.00E+03

1.00E+02

1.00E+01

1.00E+00 1.00E+00

1.00E+01

1.00E+02

1.00E+03

Number of collaborators

Fig. 15 Collaborator per author (WoS) T1

1.00E+05

Number of authors

1.00E+04

1.00E+03

1.00E+02

1.00E+01

1.00E+00 1.00E+00

1.00E+01

1.00E+02

1.00E+03

Number of collaborators

Fig. 16 Collaborator per author (WoS) T2

(2001a, b) who, for example, obtained a very large cut off value of 5,800 for the MEDLINE biomedicine database. Density and centralization In the collaboration network studied here, the density is small, typically in the range 10-3– 10-4. With more and more researchers added to the network over the years, it seems conceivable that the density eventually becomes quite small. However, it does not seem entirely likely that the number of collaborators of a certain researcher increases in

123

Scientometrics (2015) 102:285–306

301

proportion to the number of newly added researchers in the network. The number of collaborators of a particular researcher depends on the amount of time and research commitment that he or she is willing to devote to research activities rather than on how many new individuals are added to the networks over a certain period of time. Although our networks are decidedly sparse, the close density scores do not necessarily reflect identical internal structures for the Scopus and WoS databases. One must additionally examine the mean degrees and the centralization. For both Scopus and WoS, the r values are above 95 % in both T1 and T2. This implies that the underlying network structure is highly centralized with respect to density. This scenario may be attributed to the presence of a few influential researchers in the universities, private research laboratories, or government research institutes who groom graduate students and postdoctoral fellows over a long period. The groups and subgroups formed in this way constitute small research circles having a high degree of internal cohesion. However, their connections with the outside research world may not necessarily be large. The density and the associated centralization must therefore be examined together with the clustering property of the network as well as with the sizes of giant clusters (discussed later). Average distance In T1 and T2, we find that the average geodesic distances between pair of researchers who have co-authored papers in cancer research to lie in the approximate range of 3–5. This is in the small-world domain (Watts and Strogatz 1998). In T2, the distance has decreased by about 23 % in the Scopus database and increased by about 8 % in WoS. The small-world values do not, of course, signify that a direct research involvement will necessarily tie the two researchers together. Nevertheless, it does indicate the possibility for research knowledge to propagate effectively through the socio-professional ties binding the researchers together in a network where a large connected community exists. Furthermore, it is important to notice that almost all of our results are quite close to the lrand benchmark. Based on this criterion, we can say with confidence that our network exhibits the small-world property. Giant cluster The size of the giant clusters is recorded in Table 1 for Scopus and WoS in both T1 and T2. In the table we also show the size of the giant cluster as a percentage of the size of the complete network. About 80–90 % of the Indian cancer researchers are seen to be connected in both periods T1 and T2. It is also important to note that, for both Scopus and WoS in most cases, the FI giant cluster is a little larger in size than the AI one. The reason is that, the FI algorithm is prone to making errors in associating two different researchers as one individual in certain cases. Some disconnected researchers may therefore have joined the giant cluster in this way. On another note, the merging of two individual researchers having the same first initial in the FI algorithm can sometimes increase the size of the FI giant cluster when these researchers belong to different clusters before merging. By contrast, if they belong to the same cluster, merging will expand the AI giant cluster. Clustering As seen in Table 1, the WoS value of CC is much higher (about 149.4 %) than the Scopus value in T1. In T2, it is higher by about 146.4 %. Besides, in both T1 and T2, the CC values

123

302

Scientometrics (2015) 102:285–306

Table 2 Share of largest and smallest core of researchers and number of cores in giant cluster in cancer research in India Scopus

WoS

T1

T2

T1

T2

TOP (AI) FI

0.120 (25)

0.070 (56)

0.131 (29)

0.129 (45)

0.117 (25)

0.067 (58)

0.120 (30)

0.116 (46)

BOTTOM (AI) FI

0.001 (25)

0.001 (56)

0.002 (29)

0.001 (45)

0.002 (25)

0.001 (58)

0.002 (30)

0.001 (46)

in our network are considerably larger than those in random networks. This clearly substantiates our connection that the cancer research collaboration network in India is a truly social network, where strongly interconnected socio-professional tiers between the researchers bring about a highly cohesive topological structure of the network. Core The number of cores has increased over the periods, almost doubling in Scopus. As seen from Tables 1 and 2, the giant cluster size in Scopus is larger than in WoS. The size of largest core in WoS is bigger than that of Scopus. This is primarily due to the coverage variations in the databases. The largest core sizes in Scopus and WoS in cancer research in India are much higher than the smallest core sizes in the network. Table 2 clearly shows that nearly 10 % cancer researchers are closely connected in the cores to collaborate in this field.

Discussion of results: implications for research policy In this paper we have analyzed the patterns of collaboration through the publication of papers co-authored by cancer researchers in India. This section discusses the major policy implications derived from a logical interpretation of the findings of our study presented in the previous section. In particular, we examine how appropriate policies should be formulated based on the underlying theoretical principles of network topology and the emergent patterns of behavior in the context of cancer research collaboration in India. To properly understand some of the concerned issues, it is necessary to delve deeper into the microstructures of the underlying collaboration processes, most of which are of dynamical origin. These problems will be addressed in a future communication. Fragmentation and bridges between small clusters in collaboration determine the presence or absence of large-scale network connectivity. Our analysis here has shown that the cancer research network is well-connected in India over the entire period captured in the present study. Besides, there are a number of highly productive researchers in the network. However, there is also some sign of internal fragmentation in the network, revealing the presence of small clusters of researchers tied to one or more highly central researchers. This local dominance is one reason for the creation of structural holes in the network (Burt 2004). Within any small cluster, a certain degree of structural cohesiveness will inevitably persist as long as the central authority is active in holding the cluster together. Policy makers must recognize that isolated clusters are the source of several problems, the most critical one being the breakdown of connectivity. In this regard, problems become increasingly aggravated

123

Scientometrics (2015) 102:285–306

303

when, over the years, the central researchers retire from active employment or move on to work in unrelated areas of research. To alleviate these conditions, research policy makers should institute means to boost the formation of inter-cluster bridges. Providing institutional and governmental incentives might be highly beneficial in this regard. Also, as a policy, it may be a good practice to organize a few national conferences and at least one international conference every year to provide the right environments to bring together Indian researchers for free socio-professional interactive exposure. This will raise the likelihood of forming more bridges in the networks between clusters of individual researchers from many specialized areas of cancer research in the country. In cancer research collaboration in India, we now have a better understanding of the impact of collaboration size on the underlying network structure. Tied to the overall size of the collaboration is the average geodesic distance in the network, which is also a good indicator of network connectivity. Our results have shown that a fairly large fraction of the cancer researchers in India writes papers with about 10–21 collaborators on the average. By contrast, the number of authors per paper is somewhat low, typically in the range of 2–5. This reflects to a strong tendency of the individual research problems to remain practically confined to within small groups of researchers. Although this practice may be temporarily useful for solving extremely specialized research problems, it runs the risk of becoming more and more isolated from mainstream cancer research in time. The scope of finding innovative ideas for solution also becomes very limited. Research policy in this respect should focus on integrating isolated research clusters into the clusters that possess broader research perspectives. This could even be achieved at the level of weak coupling between the clusters. As seen previously, the average distances in our networks are in the small-world range. From the perspective of research policy, this and the concurrent existence of giant clusters are critical elements of healthy research collaboration. If a collaboration network percolates with a functionally operating giant cluster, then researchers can quickly connect with one another to exchange information, to discuss ideas, and even to speculate on the implications of new findings. Whether this atmosphere is ultimately congenial to building new research collaborations between researchers (in effect, enhancing the clustering probability) is a critical issue for formulating effective research policies. The simultaneous existence of a high average degree and a moderately high clustering shows that the cancer research networks in India are not overly hierarchical in topology. Using the MEDLINE database for biomedical research, Newman (2001a) found that collaboration research in biomedicine was highly hierarchical in nature. He speculated a plausible scenario where a principal investigator supervising graduate students and postdoctoral fellows would publish papers with them on different projects, thereby raising the number of co-authors in published papers. However, due to a somewhat the high clustering in the India network, the hierarchal structure is not overly prominent, although data from the Scopus database do show a weak tendency towards hierarchy. Historically as well as culturally, the research collaboration arena in many scientific disciplines in India is dominated by a small group of influential senior researchers. This tends to make the corresponding networks hierarchical in character. A problem with this type of topology is that the structure is prone to easy breakdown. A good practice in research policy, in this regard, is to break up the hierarchical nature of collaboration and make it more decentralized and distributed. Providing incentives to young researchers is worthwhile. Incentivizing inter-institutional collaborative research is also useful. The present study additionally contributes to existing theory by examining the circumstances surrounding the formation of transitive triads in cancer collaboration research

123

304

Scientometrics (2015) 102:285–306

in India. In general, the collaboration network is reasonably well clustered, implying that collaboration of three or more researchers is common in cancer research in the country. Thus, one researcher who has collaborated with two others has a probability of bringing the other two to collaborate themselves. The probability increases when socio-professional ties are strong among the researchers in the network. This frequently happens, because researchers belong to the same research circle, work in the same institution, attend the same conferences and workshops or, perhaps, serve in the same editorial boards of journals. However, since there is also some fragmentation in the network, many of the small clusters in the network lack inter-cluster bridges. As regards research policy, it may be worthwhile to provide several means for external incentives to encourage active interdisciplinary research in cancer in the country. As added incentives, the controlling institutions of the researchers or the government’s research funding agencies may create opportunities for inter-institutional multidisciplinary research in cancer biology and biomedicine (Hara et al. 2003). The giant cluster in cancer research collaboration in India is of functional size, and the network operates well within the percolating regime. Over the 12-year period captured in this study, the network threshold separates many small clusters from the giant cluster. From research policy perspective, it is indeed a good sign, since the existence of largescale connectivity in a network is a necessary prerequisite for reaping all the benefits of a small world. Knowing that the network percolates, it is now the next step to institute an effective policy to create bridges between the still-unconnected small clusters to fill up as much as possible the remaining volume of the complete network. Our present study contributes to the understanding of the phenomenon of power law behavior of the collaboration distributions for cancer research in India. We have seen, for example, that the distributions of the number of collaborators of researchers as well as the number of papers written by them follow power laws with varying exponents. The interesting observation here is that, these power laws are imperfect. Thus, although there exist some highly connected researchers in the networks, cancer research collaboration in India is primarily dominated by the majority of researchers having only a relatively small number of collaborators. Overall, the network sizes (in terms of the number of papers, authors, and collaborators) are much larger in T2 than in T1. However, their static topologies are essentially similar in the two periods. Still, there is also some indication of latent growth in the network. From research policy perspective, it would indeed be useful to look into the growth rate and find out if it is accelerating or retarding at the present time. An accelerating rate of growth is a sign of good health for cancer research collaboration in the country. It is important to note in this connection that, similar network structures of comparable size have appeared in other research fields as well. For example, in an earlier work (Ghosh and Kshitij 2014), we investigated several large-scale characteristics of dichotomous collaboration co-authorship network structures of researchers through their co-authored papers published in peer-reviewed journals and conference proceedings in the fields of management and information, including related areas of information technology and economics. Much earlier, similar, albeit non-identical, collaboration network structures were found in the fields of biomedicine, physics (several subfields of specialization), mathematics, and computer science (Baraba´si et al. 2002; Newman 2001a, b). A related issue in cancer research collaboration is to investigate how many collaborative projects span multiple countries, such as the US, the European Union, and countries in Southeast Asia. Major international conferences are a good source for starting new international collaboration. Inter-country analysis of cancer research collaboration remains practically unexplored in this paper but will be taken up in a future work.

123

Scientometrics (2015) 102:285–306

305

Conclusions Our analysis of the structures of the cancer research collaboration networks in India has revealed a number of interesting characteristics of the network. We have also explained how the results can be used to take various measures for research policy to improve research collaboration in the country. These benefits notwithstanding, the present study has a number of limitations. Successfully overcoming some of them will open up new possibilities of research in this area in the future. The first concern is about data. Our data spanned only a limited time window for author attribution in the publication of scholarly articles in peer-reviewed journals and conference proceedings. In particular, we have used only a 6-year’s worth of data to construct the collaboration networks for each period. Earlier researchers (Newman 2001a, b) have shown that this limited window of time does not correctly reproduce power-law forms of scientific productivity through collaboration. The limited window size introduces an inevitable cutoff in the concerned frequency distributions. This limitation can be successfully dealt with, however, by including more and more data from earlier years in the study. However, there is not much evidence of cancer research in India prior to 1995. Next, we have not directly studied any growth of the collaboration network. Instead, we have considered the network to be essentially static over each 6-year window. This assumption is probably not correct in general, but it may be treated as a first-order approximation to the study of an underlying network dynamics. There is evidence in the data of some hidden growth in the network. One indication of this lies in the curvatures exhibited by some of the distributions that we have mentioned in the Results section. We propose to communicate the attendant issues of network growth in a future paper. The non-inclusion of institutional affiliations of researchers in the present study has also reduced some resolution in the accuracy of our numerical results. In this connection, it is also necessary to recognize some debilitating effects of centrality in the networks. In a country like India, the power distance is high (Hofstede 1983). It is therefore important to examine if certain influential, senior researchers are dominating specific arenas of research with their own interests, beliefs, or styles of work. This investigation will have interesting cultural implications. Sometimes, dominance exercised by influential researchers goes a long way to build strong internal relationships within small collaboration clusters. Unfortunately, there is also the possibility of network fragmentation caused by excessive local dominance, which prevents local group members from getting the inspiring benefits of free exchange of thought and ideas with peers outside their immediate circles of collaboration. This tendency may eventually lead to a lack of variation in research directions. Importantly, some of these cluster-specific considerations can be explored and elaborated further by employing an ego-centered examination of some of the network characteristics. This involves extensive case- and survey-based data collection, which is currently beyond the scope of the present paper. Work in this direction is currently underway and the results will be communicated in future paper. Acknowledgments The authors wish to acknowledge two anonymous reviewers of Scientometrics for suggesting a few improvements in the paper. Thanks are also due to Dr. P. Banerjee, Director of CSIR – NISTADS, for helpful comments on an early version of the paper. Jaideep Ghosh would like to thank the Department of Science & Technology, Government of India, for financial support to carry out this work.

123

306

Scientometrics (2015) 102:285–306

References Acedo, F. J., Barroso, C., Casanueva, C., & Gala´n, J. L. (2006). Co-authorship in management and organizational Studies: An empirical and network analysis. Journal of Management Studies, 43(4), 957–983. Albert, R., & Baraba´si, A.-L. (2002). Statistical mechanics of complex networks. Reviews of Modern Physics, 74(1), 47–97. Baraba´si, A.-L., Jeong, H., Ravasz, R., Ne´da, Z., Vicsek, T., & Schubert, A. (2002). On the topology of the scientific collaboration networks. Physica, 311, 590–614. Batagelj, V., Mrvar, A., & Zaversˇnik, M. (2012). Obtained from http://pajek.imfm.si/doku.php. Accessed 2 Jan 2014. Bozeman, B., & Corley, E. (2004). Scientists’ collaboration strategies: Implications for scientific and technical human capital. Research Policy, 33(4), 599–616. Burt, R. S. (2004). Structural holes and good ideas. American Journal of Sociology, 110(2), 349–399. Eckhouse, S., Lewison, G., & Sullivan, R. (2008). Trends in the global funding and activity of cancer research. Molecular Oncology, 2(1), 20–32. Ghosh, J., & Kshitij, A. (2014). An integrated examination of collaboration co-authorship networks through structural cohesion, holes, hierarchy, and percolating clusters. Journal of the American Society for Information Science and Technology. Early view http://onlinelibrary.wiley.com/doi/10.1002/asi. 23058/abstract. doi:10.1002/asi.23058. Accessed 3 April 2014. Granovetter, M. S. (1973). The strength of weak ties. American Journal of Sociology, 78(6), 1360–1380. Hara, N., Solomon, P., Kim, S.-L., & Sonnenwald, D. H. (2003). An emerging view of scientific collaboration: Scientists’ perspectives on collaboration and factors that impact collaboration. Journal of the American Society for Information Science and Technology, 54(10), 952–965. Hofstede, G. (1983). National culture in four dimensions: A research-based theory of cultural differences among nations. International Studies of Management and Organization, 13(1–2), 46–74. Lewison, G., Purushotham, A., Mason, M., McVie, G., & Sullivan, R. (2010). Understanding the impact of public policy on cancer research: A bibliometric approach. European Journal of Cancer, 46(5), 912–919. Leydesdorff, L., & Wagner, C. S. (2008). International collaboration in science and the formation of a core group. Journal of Informetrics, 2(4), 317–332. Lotka, A. J. (1926). The frequency distribution of scientific productivity. Journal of the Washington Academy of Science, 16(2), 317–323. Melin, G., & Persson, O. (1996). Studying research collaboration through co-authorships. Scientometrics, 36(3), 363–377. Milgram, S. (1967). The small-world problem. Psychology Today, 1(1), 61–67. Milojevic´, S. (2013). Accuracy of simple, initials-based methods for author name disambiguation. Journal of Informetrics, 7(4), 767–773. Newman, M. E. J. (2001a). The structure of scientific collaboration networks. Proceedings of the National Academy of Sciences, 98(2), 404–409. Newman, M. E. J. (2001b). Scientific collaboration networks. I. Network construction and fundamental results. Physical Review E, 64 (016131), 016131-1–016131-8. Newman, M. E. J. (2003). The structure and function of complex networks. SIAM Review, 45(2), 167–256. Scott, J. (2000). Social network analysis: A handbook. London: Sage Publications. Shrum, W., Genuth, J., & Chompalov, I. (2007). Structures of scientific collaboration. Cambridge, Massachusetts: MIT Press. Wasserman, S., & Faust, K. (1994). Social network analysis. Cambridge, UK: Cambridge University Press. Watts, D. J., & Strogatz, S. H. (1998). Collective dynamics of ‘small-world’ networks. Nature, 393, 440–442.

123

Suggest Documents