The Dynamics of Multidisciplinary Research Networks

8LMW HSGYQIRX MW ER I\XIRHIH EFWXVEGX JSV XLI ;SVPH 7SGMEP 7GMIRGI *SVYQ ;77* 'EREHE LXXT [[[ [WWJ SVK TETIV EVXMGPI H]REQMGW QYPXMHMWGMTPMREV] VIWIEVGL RIX[SVOW QMRMRK TYFPMG VITSWMXSV] WGMIRXMWXW GZW

The Dynamics of Multidisciplinary Research Networks - Mining a Public Repository of Scientists CVs Claudia Bauzer Medeiros∗

Jesus Mena-Chalco†

Abstract Brazilians involved in any field of research are required to publish their CVs in the national’s Curriculum Vitae public database, called Lattes. This database contains information about all aspects of a researcher’s activity - from ongoing and past projects, to students supervised, and results of any intellectual endeavour (e.g., patents, publications, production of videos or participation in spectacles). As of September 2013, there are almost 3.2 million CVs in this database, freely accessible via the Lattes web portal. This database is being used by Brazilian scientists in many fields to derive information about research profiles, collaboration across disciplines, or the economics of research trends. This has also given rise to new results in Computer Science, where new algorithms to mine this database are helping social scientists to better understand the dynamics of research networks in Brazil. This paper presents a first analysis of co-authorship networks and terms used in publications in three Lattes groups – the Applied Social Sciences, Humanities, and Linguistics, Letters and the Arts. Our analysis is based on mining data from approximately 670,000 researchers in these fields, and over 5 million publications.

1

Introduction and Motivation

The goal of this paper is to provide insights on publication and cooperation patterns of Brazil-based researchers in the Humanities, Applied Social Sciences, Linguistics, Letters and Arts. Like most such studies, our analysis is based on publications. However, as will be seen, unlike such studies, we take advantage of Brazil’s unique Lattes CV database, in which a wide variety of publications are recorded ∗ †

Institute of Computing, University of Campinas, Brazil Federal University of the ABC, Brazil

1

(e.g., not only peer-reviewed work, but also research reports, newspaper articles, or music composed). This provides a singular, more comprehensive view of research dissemination mechanisms in these areas. To the best of our knowledge, ours is the first work to conduct this kind of analysis for these knowledge domains. Publications are the most recognized means of disseminating scientific work. In some fields, other means of dissemination are also considered. For instance, in Computer Science (from now on shortened to CS), the production of software or of scientific databases to be used in research are becoming acceptable as new modalities of scientific publication. As a consequence, software downloads or database usage by others are starting to be adopted in this community in parallel with citation indices. In the Arts, multimedia artifacts are also considered publications. The Web has made it possible to instantaneously increase the dissemination of these research products to the world – not only through journal sites, from which one can download papers, but also from digital libraries and even personal pages [Thelwall2008]. If this facilitates scientific dissemination, it also brings about the problem of too much information to analyze. Information overload – or, alternatively, what has become known as the data deluge – is an important issue here [Bell et al.2009, Hey et al.2009]. As pointed out in [Rosenberg2003], the notion of ”information overload” is not new. As mentioned in that paper, already back in the XVI century, scholars complained that the invention of the press was multiplying the number of available books, leading to an explosion of the texts that had to be read in order to gain knowledge. Nowadays, we face a new facet of the same phenomenon; indeed, thanks to the widespread dissemination of digital media, most data being now produced will never be seen by humans, and needs software for analysis and interpretation. This growth of publication modalities and multiplication of media formats poses several challenges when it comes to the analysis of the scientific production of a researcher, or of a research group. Here, one is faced with at least four issues. First, how to go through the “publications” (in a wide sense) of a person, or a group, if they are spread over a wide range of sites? Second, how to correlate publications in so many formats and media? Third, how to analyze (multimedia) research relationships across researchers and groups? Last but not least, since data describing publications is heterogeneous across sites, how to examine this data to derive useful information on research dynamics? The fields of bibliometrics and scientometrics1 are dedicated to answering some of these questions, 1

Here, these two terms will be used interchangeably, since they are generally considered to be synonims.

2

defining parameters and laws to derive information such as a person’s publication profile, or the most influential journals in a given domain [Gl¨ anzel and Schubert2003, Zhang et al.2013]. In the last decade, bibliometrics research has taken advantage of work in text mining and information retrieval (e.g., [Baeza-Yates and Ribeiro-Neto1999, Manning et al.2008]) to support, among others, studies in interdisciplinary collaborations and publication trends. These text mining algorithms, developed within Computer Science research, are mostly run on standard bibliometric sources, notably journal citation databases and publishers’ sites (e.g., [Chen2008]). Thus, a large percentage of scientometric studies are based on journal publications. In some cases, these studies may look at specific digital libraries, usually restricted to a given scientific domain. For instance, in Computer Science, the Association of Computing Machinery (http://www.acm.org) or the IEEE Computer Society (http://ieee.computer.org) digital libraries publish thousands of conference papers every year. These libraries can serve as a basis for analyzing Computer Science research trends using text mining techniques applied to paper titles, keywords and abstracts. They are a particularly valuable resource, considering that, in Computer Science, scientometric studies show that researchers publish 4 full conference papers per journal paper. An example of this kind of study on conferences is discussed by Cohoon [Cohoon et al.2011], which analyzed over 3,000 conferences for the period 1966-2009 and 356,703 authors, to understand gender publication patterns in Computer Science. Yet another bibliometric source for CS papers is DBLP (http://www.informatik.uni-trier.de/∼ley/db), which is an index of thousands of titles published every year all over the world, and which is used widely to study research patterns in this field (e.g., [Reitz and Hoffmann2010]). In most cases, however, hournal papers, books and book chapters are the only basis for analyzing research profiles, trends and networks. This hampers studies in areas such as the Arts, for, among others, lack of standards (and data sources) for publication metadata, given the increasing amount of multimedia production. This paper takes advantage of a unique source of research information – the Brazilian Lattes Web database, in which all researchers who work in Brazil are required to enter their CVs. By September 2013, Lattes had over 3.2 million curricula, stored in a standard format. This database is being used for several kinds of bibliometric studies. Here, we have used Lattes as our basic information source for analyzing research patterns in the aforementioned areas, for the period 1991-2010. The rest of this paper is organized as follows. Section 2 gives a brief overview of the Lattes

3

platform. Section 3 presents the main results of our analysis. Section 4 concludes the paper indicating a few future directions.

2

The Lattes platform

The Lattes Platform (http://lattes.cnpq.br) is a Web-based set of databases and software tools maintained by the Brazilian government (Ministry of Science and Technology) with a wide range of facts and information about research and development data. The Lattes platform has been fully incorporated by the Brazilian academic community [Lane2010]. Its most frequently accessed component is the Lattes Curriculum, a comprehensive database that at September 2013 contains more than 3.2 million CVs of (conceivably all) people in Brazil that are involved in some form of research; it also includes CVs of several researchers who do not live in the country, but participate in Brazilian-funded projects. The Lattes CV is a document created in the 70’s by CNPq2 to standardize and centralize personal, professional and academic information of the Brazilian scientific community. Originally recorded in paper forms, its first automated version appeared in the 90’s. Lattes data is entered by each person, on the Web. These CVs are used all over the country by research grant agencies to evaluate researchers and graduate programs, and also as an information source to help evaluate research proposals. For instance, grant proposals to federal agencies are submitted online and must always be accompanied by the (http) links to the CVs of all project members, to help reviewers evaluate the project team. Thanks to this nation-wide use, Brazilian researchers routinely update their Lattes CVs as part of their routine. The Lattes Platform is thus a comprehensive source for analysis of the dynamics of several knowledge areas. For instance, [Laender et al.2008] used Lattes information, together with DBLP, to analyze the productivity of Brazilian Computer Science graduate programs. The same has been done to sciences such as Physics or Mathematics. Individual Lattes curricula can be obtained via the Web platform, and transformed to printable or XML formats. The latter is often used by researchers to integrate Lattes information to other data sources. As well, XML CVs can be obtained in “bulk” for research on text mining, for specific projects, given appropriate permission. In 2011, CNPq made available the Painel Lattes (Lattes panel) software tool (http://estatico.cnpq.br/painelLattes/mapa), that shows distinct kinds of statistical information 2

Brazilian National Council of Scientific and Technological Development, which is the Brazilian federal research

foundation for Science and Technology.

4

extracted from the CVs – e.g., pie charts on age, geographic or gender distribution, or research fields. When entering CV information, a person can provide data on the following items: • General and Professional data (e.g., address, academic background, positions held, courses given, prizes received) • Research projects (details on projects led by a person, or in which (s)he participates – e.g., project title, abstract, members, period of time, amount awarded) • Research products/outputs (most kinds of products of one’s research and development activities, divided into Publications, Technical products and Artistic and cultural products. Here, one can respectively enter, for instance, all kinds of publications, results of consulting activities, films directed/produced and so on. Each type of product has a specific form for entering its details; in particular, papers have links to their DOI’s, which allows accessing a paper directly from a person’s CV) • Innovation (e.g., patents, software) • Supervisions (students supervised at all levels; each student name is in turn automatically linked to the student’s Lattes CV by the platform, which facilitates analysis of supervision networks) • Additional classes of data, such as Events (e.g., scientific events organized) or C,ommittees (e.g., participation in scientific committees), or impact factors. A printed full Lattes CV may have over 200 pages, depending on how much information a person provides (e.g., some people describe their projects at length). Typically, academics provide General and Professional data, relevant Research products and Supervision information. The Lattes CV database is thus a valuable and unique scientometrics data source, for three reasons: (i) the Lattes curricula have become a standard for the evaluation of individual academic/professional activities; (ii) the majority of Brazilian researchers are registered in the Lattes Platform, and (iii) each curriculum embodies a list of the academic/professional productivity, all put together in a single site, in a standard format. Moreover, since it contains data on a large range of research outcomes, it offers a comprehensive set of facts from which to draw portraits of the research dynamics of a person, a group, or even a research field. This is particularly valuable for domains such as the Arts or Humanities, where it is hard to find comprehensive (and standardized) production catalogues.

5

The Lattes curriculum is self-declaratory and does not require the registered publications to be indexed by Scopus, Web of Science, or SciELO. Note that, as reported by [Hicks and Wang2009], the Scopus and Web of Science maintain a limited representation of the bibliographic production in the Social Sciences, Humanities, and Arts.

3

Data analysis

3.1

Base data

To create a Lattes CV requires researchers must indicate their primary research field (classified into 8 major fields, each with several subfields), and thus each CV can be globally associated to one of these fields. Publications can also be classified according to this taxonomy. For the purpose of this study, we selected researchers who had indicated their main field as one of the following: (a) Humanities, (b) Applied Social Sciences, and (c) Linguistics, Letters and Arts (each of which one major Lattes class). This section describes some of our findings based on an analysis of 633,508 Lattes curricula in these three areas, covering twenty years (1991-2010). The base data used in this work was extracted from the Lattes database in May 2011 and contains 1,236,548 curricula [Mena-Chalco et al.2013]. Indices associated with the scientific contributions, such as number of citations or impact factor, are not considered because this data is not presented in all bibliographic production types registered on the Lattes Platform. For our study, we included the following categories of publication: (i) Articles in scientific journals; (ii) Books published/organized; (iii) Book chapters published; (iv) Full papers, extended abstracts, and abstracts published in conference proceedings; and (v) Articles in newspapers/magazines. Hence, from now on, whenever we talk about our analysis of ”publications”, we are including all these categories of knowledge dissemination. For our analysis, we extracted 270,149 CVs in the Humanities, 264,230 in the Applied Social Sciences and 99,129 in Linguistics, Letters and Arts. There were overall 4,946,990 publications whose titles were mined. The identified publications for each major field were: 2,743,625 (Humanities), 1,591,477 (Applied Social Sciences), and 611,888 (Linguistics, Letters and Arts).

3.2

Analysis methodology and ScriptLattes

The Lattes curricula concern individual information for each researcher registered on the Platform. Thus, performing a summarization, or knowledge representation of research outputs for a group of

6

researchers of medium or large size really requires considerable manual effort and a myriad of processes, susceptible to failures which can significantly influence the measurements obtained. We used the ScriptLattes [Mena-Chalco and Cesar Junior2013] software to automatically analyze Lattes data. This software was designed and developed within research conducted in the scope of text mining algorithms. ScriptLattes is being widely used in Brazil to analyze research trends and correlations. For instance, [Perez-Cervantes et al.2012] uses the software to investigate the internationalization of Brazilian academic research groups; [Mugnaini et al.2013] discusses relationships among supervisions of graduate theses and participation in committees in the Brazilian physicist community, while [Mena-Chalco et al.2013] analyzes co-authorship networks. The software is also used in the evaluation of graduate CS programs for the Ministry of Education, to establish correlations across programs, help define evaluation criteria and ultimately understand differences in research relationships across multiple CS fields. ScriptLattes is an open-source knowledge extraction system from the Lattes Platform commonly used for analysing individual Lattes curricula and automatically summarize their scientific production, as well as discover co-authorship networks for large size groups. Given a group of researchers, this software downloads their Lattes curricula from the Platform, extracts their scientific output, eliminating redundant bibliographic production. It next creates reports about the production, including academic supervisions as well as the co-authorship network and produces cartographic maps that shows where groups and researchers are located. Such reports are discretized by type and publication year. ScriptLattes can be used for exploring, identifying and validating patterns of scientific activities, constructing bibliometric/scientometric information about a group of interest. It is important to note that the system only deals with information from Lattes database (i.e., it does not combine such information with other sources, e.g., journal citation sites).

3.3

Research dynamics - example in the Humanities

We used ScriptLattes to analyze research and co-authorship networks for a twenty year period covering 1991-2010 (divided into four quinquenia for better visualization of results, namely, 1991-1995, 1996-2000, 2001-2005 and 2006-2010). For collaboration networks (endogenous networks), basic data concerned all papers published by a person (the center of a given network), and then the person’s coauthors. We must point out that, for this study, only co-authors registered in the Lattes database were considered (and thus non-Brazilian co-authors without a Lattes entry were not analyzed). Each author

7

has a unique identifier within the Lattes database, which is used to correlate people. Co-authorship is established whenever all authors independently record a given paper in their CVs. Eventual differences in title or publication venue go through automatic verification (using text comparison methods), so that even if titles are not identical across CVs, the co-authorship can be confirmed. Figure 1 shows the evolution of the co-authorship network for a Humanities researcher who is considered to be a good researcher by her peers. The figure shows that, in the first quinquenium (1991-1995) - top left corner - the corresponding graph was still relatively sparse, with the building of relationships, while in the subsequent decade (1996-2005) her research connections became multiple and well established. Now, this researcher is close to retirement, and the final quinquenium (20062010), at bottom left, shows a decrease in relationships. However, this does not signify she has stopped publishing papers – figure 2 shows her papers for the same period. The figure indicates a steady publication flow throughout the years, with an actual increase in number of papers for the periods in which the co-authorship graphs are more sparse.

Figure 1: Evolution of co-authorship relationships for a Humanities researcher. At the top, left to right, periods 1991-1995 and 1996-2000; at the bottom left to right, 2001-2005 and 2006-2010. This researcher’s co-authorship network is not typical in this domain. Most graphs analyzed of Humanities researchers are sparse, with a relatively small number of co-authors overall, indicating 8

35 30

Publications

25 20 15 10 5

2010

2009

2008

2007

2006

2005

2004

2003

2002

2001

2000

1999

1998

1997

1996

1995

1994

1993

1992

1991

0

Year

Figure 2: Number of papers per year - same researcher. a scenario of close-knit research networks. Nevertheless, in most cases the co-authorship network increases with time, as a person’s research career evolves. We chose to use her Lattes data to illustrate the kind of temporal analysis that can be performed visually. We also point out that we opted for using graphs without names of people, but ScriptLattes can also produce graphs with names. Another issue is that these graphs consider only co-authors that have Lattes CVs. Therefore, they are less illustrative of a person’s co-authorship network when this person majoritarily works with researchers that do not live in Brazil. As we shall see, however, in the next section, this may not seem to be that relevant to the fields studied here, since a large majority of their publications are in Portuguese (possibly indicating a smaller percentage of collaborations outside Brazil).

3.4

Research dynamics - common terms

While the graphs in the previous section are useful to analyze the dynamics of research cooperations for individuals, or small groups, they are less useful when it comes to visualizing research interactions within large groups, because of the multitude of nodes and connections that appear - for instance, that researcher’s graphs contained more than 100 different names overall. To continue our study, we thus decided to find out common research topics and keywords in a given field. To this purpose, we built wordle ”clouds” from publication titles for the period under study, for the same three research areas – Humanities, Applied Social Sciences, and Linguistics, Letters and Arts. Wordle (http://www.wordle.org) is a software available on the Web that allows creating word maps (or clouds) from a given textual source. Clouds give greater prominence to words that appear more frequently in the textual source text; for these renderings, we used the 200 most frequent terms across the periods. Figures 3, 4, 5 and 6 show the evolution of the 200 most frequent terms in the Humanities (for the period 1991-2010). First of all, we point out that they are almost all in Portuguese. Brazilian research 9

in the three research areas investigated is majoritarily published in Portuguese, which hampers its indexing in international indices. Moreover, this also hampers studies on this research that are majoritarily conducted over international bases – e.g., such as those conducted by [Hicks and Wang2009]. These figures show some intriguing patterns. We point out, for instance, that term educa¸ca õ (education) is prominent in all periods, but seems to be growing in importance with time. The term sa´ ude (health) appears constantly among the top 20. On the other hand, the term Brazil has slowly disappeared from the titles, becoming invisible at the end of the 20-year period. A slower decrease is being followed, for instance, by estudo (study), whose presence in titles is decreasing; the same applies to avalia¸ca õ (evaluation). Another kind pattern is exemplified by ensino (teaching) that was prominent in the first ten years, disappeared in the period 2001-2005,and reappeared in the last quinquenium.

Figure 3: Wordle cloud for Humanities terms found in paper titles – 1991-1995

Figure 4: Wordle cloud for Humanities terms found in paper titles – 1996-2000 These title terms can also be visualized using another structure, that allows analysis of title construction patterns. Figure 7 shows a so-called tree cloud of terms from Humanities for the twenty year period. TreeCloud [Gambette2007] is a software that builds a tree cloud visualization of a text, 10

Figure 5: Wordle cloud for Humanities terms found in paper titles – 2001-2005

Figure 6: Wordle cloud for Humanities terms found in paper titles – 2006-2010 which looks like a tag cloud where the tags are displayed around a tree to reflect the semantic distance between the words in the text. This tree shows, among other things, how words appear in relation to a title. In the figure, terms at the top are the ones most likely to appear in the beginning of a Humanities paper title (e.g., translating to English – policies, social sciences, relationships, practices, reading) while those at the bottom right are those most likely to appear in the end (e.g., education, teachers, teaching, analysis, Brazil). We produced the same sort of study for the other two areas. For brevity sake, we will only show the first and last quinquenium of each, and briefly provide some examples of this analysis. Figures 8 and 9 display wordle clouds for publication titles on Applied Social Sciences. In this domain, Brazil was one of the top terms in the first quinquenium, and progressively lost terrain to an´ alise (analysis) and estudo (study). Terms such as desenvolvimento (development) and estado (state) are consistently among the top twenty. Terms such as sa´ ude (health), gest˜ ao (management) and empresas (enterprises) are gaining importance, while qualidade (quality) or trabalho (work) progressively disappeared.

11

fisica fundamental escolar alunos pratica curso matematica sociais profissional professor ciencias politica politicas relacoes brasileira relacao paulo contexto cultura estado praticas sul espaco partir rio producao leitura infantil cidade bidule

municipio programa uso gestao escola publica ambiental

vida

psicologia pesquisa projeto experiencia

construcao historia avaliacao brasil social processo analise trabalho aprendizagem estudo saude desenvolvimento caso criancas

educacao ensino formacao sobre professores

Figure 7: Tree cloud showing the relationships among terms in the Humanities.

Figure 8: Wordle cloud for terms found in paper titles of Applied social sciences – 1991-1995

Figure 9: Wordle cloud for terms found in paper titles of Applied social sciences – 2006-2010

12

The tree cloud (Figure 10) for the twenty years shows that terms such as social policies, environment, rights, turismo (tourism) or culture are more likely to be at the beginning of a title. Titles are also more likely to end with, for instance, Brazil, industry, case study, or social development. rio grande sul regiao estado paulo cidade brazil sistemas brazilian informacao comunicacao administracao uso publica experiencia politica brasileira bidule saude projeto pesquisaformacao ensino civil construcao processo

politicas sociais direitos ambiente turismo cultura municipio meio

empresa aplicacao qualidade empresas modelo brasileiro producao educacao desempenho desenvolvimento brasil setor sistema direito analise avaliacao sobre social trabalho industria ambiental estudo mercado gestao caso conhecimento

Figure 10: Tree cloud showing the relationships among terms within titles in the Applied Social Sciences. Finally, figures 11 and 12 show the first and last quinquenia for the Linguistics, Letters and Arts. Lingua (language), and leitura (reading) are consistently at the top throughout the twenty years, while literatura (literature), ensino (teaching) and educa¸ca õ (education) have been progressively climbing to the top. Hist´ oria (history) also consistently appears in the first places. Like in the other two analyses, Brazil has progressively lost importance.

Figure 11: Wordle cloud for terms found in paper titles in Linguistics, Letters and Arts – 1991-1995 Figure 13 is the tree cloud for titles in this domain. Terms more likely to be at the beginning of publication titles include linguistica (linguistics), estudos (studies), language (linguagem) and produ¸ca õ 13

Figure 12: Wordle cloud for terms found in paper titles in Linguistics, Letters and Arts – 2006-2010 de textos (text production). At the end of titles, appear for instance leitura (reading), or lingua portuguesa (portuguese language). professores formacao processo sala professor avaliacao aula aprendizagem texto ingles pratica uso praticas reflexoes trabalho

producao textos linguagem escrita genero linguistica estudos

projeto pesquisa experiencia musica escola desenvolvimento caso comunicacao social

bidule

espaco teatro cultura seculo obra poesia rio cultural critica cidade construcao identidade

leitura sobre ensino portuguesa educacao analise lingua brasileira brasil arte discurso estudo memoria portugues brasileiro historia literatura

Figure 13: Tree cloud showing the relationships among terms within titles in the Linguistics, Letters and Arts. For a final comparison, figure 14 shows the wordle cloud for Computer Science using as a basis 4010 papers published by 895 faculty presently registered in Brazil’s 48 graduate CS programs for the period 2006-2010. Though there are a few terms in Portuguese, the great majority is in English, reflecting the fact that this is a de-facto publication language for computer scientists.

14

Figure 14: Wordle cloud for Computer Science terms found in paper titles – 2006-2010.

4

Conclusions

This paper presented a brief overview of the research patterns in Brazil for three areas – the Humanities, Applied Social Sciences, and the Linguistics, Letters and Arts. The study was based on mining the Brazilian national Lattes curriculum vitae database. Our work relies heavily in the data mining tools developed in the ScriptLattes platform [Mena-Chalco and Cesar Junior2013]. As far as we know, this is the first study of this kind to analyze research dynamics in Brazil, for these domains. Overall, a few patterns emerge for these three areas – e.g., the emphasis on educational aspects and the predominance of Portuguese titles. Moreover, for all three, Brazil lost terrain in titles, indicating perhaps an increase in studies with non-national scope. As discussed here, the Lattes database can enhance the understanding of the research interactions among researchers in these areas in Brazil, thanks to its comprehensive representation of the Brazilian bibliographic production in these fields. It can also support studies of term usage in titles, and also of keywords recorded by researchers. Many of these productions are in Portuguese, and thus not registered in specialized databases such as Scopus or Web of Science. Future work will try to identify research networks across distinct fields (e.g., Humanities and Biological Sciences). Another need is to identify international cooperations. This is a much harder task, since it needs to look at the papers themselves, and derive author location from author address. This kind of information cannot be obtained from the present Lattes version. The Brazilian Ministry of Education is developing a complementary platform, called Sucupira, which will combine individual curricula from Lattes with information provided by graduate programs all over the country. This will be yet another information source for future work. Acknowledgements The work reported in this paper was partially funded by Brazilian financing agencies FAPESP and CNPq

15

References [Baeza-Yates and Ribeiro-Neto1999] Baeza-Yates, R. and Ribeiro-Neto, B. (1999). Modern Information Retrieval. ACM Press, New York, USA. [Bell et al.2009] Bell, G., Hey, T., and Szalay, A. (2009).

Beyond the data deluge.

Science,

323(5919):1297–1298. [Chen2008] Chen, C. (2008). Classification of scientific networks using aggregated journal-journal citation relations in the journal citation reports. Journal of the American Society for Information Science and Technology, 59(14):22962304. [Cohoon et al.2011] Cohoon, J., Nigai, S., and Kaye, J. (2011). Gender and computing conference papers. Communications of the ACM, 54(8):72–80. [Gambette2007] Gambette, P. (2007). Tree Cloud. http://www2.lirmm.fr/˜ gambette/treecloud/. [Gl¨ anzel and Schubert2003] Gl¨ anzel, W. and Schubert, A. (2003). A new classification scheme of science fields and subfields designed for scientometric evaluation purposes. Scientometrics, 56(3):357– 367. [Hey et al.2009] Hey, T., Tansley, S., and Tolle, K., editors (2009). The fourth paradigm. Microsoft Research. [Hicks and Wang2009] Hicks, D. and Wang, J. (2009). Towards a bibliometric database for the social sciences and humanities. http://works.bepress.com/diana hicks/18/. [Laender et al.2008] Laender, A., Lucena, C., Souza e Silva, E., Maldonado, J., and Ziviani, N. (2008). Assessing the research and education quality of the top brazilian computer science graduate programs. ACM SIGCSE Bulletin, 40(2):135–145. [Lane2010] Lane, J. (2010). Let’s make science metrics more scientific. Nature, 464(7288):488–489. [Manning et al.2008] Manning, C. D., Raghavan, P., and Sch¨ utze, H. (2008). Introduction to Information Retrieval. Cambridge University Press. [Mena-Chalco and Cesar Junior2013] Mena-Chalco, J. and Cesar Junior, R. (2013). ScriptLattes. http://scriptlattes.sourceforge.net/.

16

[Mena-Chalco et al.2013] Mena-Chalco, J., Digiampietri, L., Lopes, F., and Cesar Junior, R. (2013). Brazilian bibliometric co-authorship networks. Journal of the American Society for Information Science and Technology. (in press). [Mugnaini et al.2013] Mugnaini, R., Digiampietri, L., and Mena-Chalco, J. (2013). Correlation among the scientific production, supervisions and participation in defense examination committees in the Brazilian physicists community. In Proc. 14th International Society of Scientometrics and Informetrics Conference, pages 447–474. [Perez-Cervantes et al.2012] Perez-Cervantes, E., Mena-Chalco, J., and Cesar Junior, R. (2012). Towards a quantitative academic internationalization assessment of Brazilian research groups. In Proc Workshop on Analyzing and Improving Collaborative eScience with Social Networks. Within Proceeding of the IEEE e-Science Conference. [Reitz and Hoffmann2010] Reitz, F. and Hoffmann, O. (2010). An analysis of the evolving coverage of computer science subfields in the dblp digital library. In Research and Advanced Technology for Digital Libraries, volume 6273, pages 216–227. Springer Verlag. Lecture Notes in Computer Science. [Rosenberg2003] Rosenberg, D. (2003). Early Modern Information Overload. Journal of the History of Ideas, 64(1):1–9. [Thelwall2008] Thelwall, M. (2008). Bibliometrics to webometrics. Journal of information science, 34(4):605–621. [Zhang et al.2013] Zhang, L., Thijs, B., and Gl¨ anzel, W. (2013). What does scientometrics share with other metrics sciences? Journal of the American Society for Information Science and Technology, 64(7):1515–1518.

17