Zaytoonah University International Engineering Conference on Design and Innovation in Sustainability 2014 (ZEC Infrastructure 2014), May 13-15, 2014 Amman, Jordan Paper Code. No. 1569932175
A Measurement Study of Eccentricity in Scientific Papers’ Citation Network Mohammad Z. Masoud, Ismaeel Janoud, Ahmad Al-Nahawy Al-Zaytoonah University of Jordan 11733, Amman, Jordan
[email protected],
[email protected] ABSTRACT The emergence of graph theory and network science assisted in studying and modelling massive number of networks in the past decade. Internet, World Wide Web and social networks have been modelled and studied. Graph theory and its parameters have helped in understanding vast number of phenomena in these networks. In this work, eccentricity has been investigated in citation network. We aimed to demonstrate the relationship between eccentricity and small world phenomena. In addition, we aimed to map this important metric into a physical meaning in citation network. To facilitate our conducted experiment, we have constructed a citation graph through crawling citeseerx network. Our result shows that, rich get richer ‘small world’ can be measured and observed through eccentricity and not only through power law distribution. Keywords: Citation Network, Eccentricity, Citeseerx 1
INTRODUCTION
It Performance metrics of graph theory have proliferating in network science studies in the past decade. Social networks [6, 7], World Wide Web (WWW) [5], autonomous system (AS) [3], router and IP graphs [2] have been constructed and studied. Researchers utilized graph studies in two main directions. First, simulate and mimic real networks [1]. This can be conducted by comparing output of real network with theoretical constructed graphs. Second, applying graph properties and metrics in studying real world networks to reveal interesting facts, such as, scaling factors, growing rate, network failure and paths [4]. This facts emerged by mapping graph theoretical values, such as, betweenness [3], node degree, number of hops, diameter, radius and cluster coefficient into their network physical meaning. For example, it has been reported that cluster coefficient and average shortest path can define graph type [4]. Small world graph is a graph with small shortest path and high cluster coefficient value. Another example is the relationship between tier-1 AS networks and the betweenness centrality in the AS graph [3]. Node degree in AS graph has been utilized to infer peering links in the AS graph. Many metrics have been mapped and studied. However, are there other metrics that can be measured and mapped into physical meaning in network science? Graph eccentricity is defined as the maximum graph distance between a node and all other nodes [8]. Eccentricity can describe the highest graph hops that a node requires to reach all other nodes in the graph. The maximum eccentricity value in the graph can describe the graph diameter. On the other hand, the smallest eccentricity value is the graph radius. In this work, we aim to study the eccentricity of nodes in citation graph. Our purpose is to investigate the physical meaning of eccentricity in citation graph. In addition, we aim to demonstrate the relation between eccentricity and small world phenomena. To facilitate our work, 1569932175-1
Zaytoonah University International Engineering Conference on Design and Innovation in Sustainability 2014 (ZEC Infrastructure 2014), May 13-15, 2014 Amman, Jordan Paper Code. No. 1569932175
5
we have generated a citation graph with scientific papers as nodes and citation as directed links between these nodes. To generate this graph, a web-crawler has been implemented to harvest scientific paper information through Citeseerx depository. We collected papers related to ad-hoc network field. To this end, we have utilized python to generated our directed graph and calculate the eccentricity value of each node. Moreover, we have created three different graphs according to the time difference between connected papers. The purpose of this step is to study the relation between eccentricity and growing rate of the graph. In this work we seek to answer the following questions: What is the meaning of eccentricity of citation network What is the relation between small world phenomena and eccentricity How the eccentricity value vary with the grow rate of the citation graph The rest of this paper is organized as follows. The next section introduces eccentricity. Section 3 demonstrates our experiment. Section 4 describes the obtained results. Finally we conclude this paper in section 5. 2
CITATION GRAPH AND ECCENTRICITY
In this section, we will start by defining citation graph or network. Subsequently, we will define eccentricity and its properties 2.1
Citation Graph
Citation graph is a directed graph. In this graph, papers are denoted with nodes. The links between nodes refers to the citation relationship between any two papers. The direction of links in this graph is based on time. In other words, links start from new nodes ‘scientific papers’ and terminate on old nodes. Citation graph may also be constructed through generating relationship between paper authors, co-authors and fields. In this work, we have generated a pure classic citation graph. 2.2
Eccentricity
The eccentricity E(V) of a vertex V is the greatest hop distance between V and any other vertex. It can be thought of as how far a node is from the node most distant from it in the graph. This metric is important since it translate many other properties. These properties are as follow: The radius R of a graph is the minimum eccentricity of any vertex R=Min {E(vi)}, i= 1,2,3….n
(1)
The diameter D of a graph is the maximum eccentricity of any vertex in the graph. That is, D it is the greatest distance between any nodes R=Max {E(vi)}, i= 1,2,3….n
(2)
Central node in a graph of radius R is one that has eccentricity value of R
Peripheral vertex in a graph of diameter D is one that is distance D from some other nodes 1569932175-2
Zaytoonah University International Engineering Conference on Design and Innovation in Sustainability 2014 (ZEC Infrastructure 2014), May 13-15, 2014 Amman, Jordan Paper Code. No. 1569932175
5
Figure 1 shows an example of graph diameter and radius based on eccentricity calculation. As we can observe eccentricity is the base of multi metrics and parameters in graph theory.
Figure 1: Example of Graph diameter and radius 3
EXPERIMENT
We implemented a web-crawler to crawl paper information from Citeseerx web-site [9]. Our crawler harvested papers name, date of publication, authors and the total number of citation. Unfortunately, scientific papers’ number is massive and it is hard if not impossible to harvest all of them. IEEE database contains more than 3 millions papers alone. To tackle this impediment we concentrated on harvesting paper from one field. We selected ad-hoc networking field. We started with a seed list of one paper and the crawler harvested the rest. The harvesting or collecting process finished in 3 days. It started in 10th of Feb 2014. Subsequently, the collected data has been utilized to generate four directed graphs. The first graph is a pure classic graph which consists of all the collected papers. The three other graphs constructed based on the time-line or the time gap between the parent or old paper and the new paper. We selected gaps of two, five, 10 years between the old and the new papers. However, not all of the collected papers have a publication date in Citeseerx depository. Nevertheless, the papers with complete information were massive, which can generate a meaningful result. After generating these graphs, Python scripting language has been used to calculate the eccentricity of all nodes in the four graphs. Table 1 shows summary of collected data. In addition, table 2 shows summary of results of the four graphs Table 1: Harvested Data Properties Values Nodes 91211 Links 221677
G-type Graph(2) Graph(5) Graph(10) Pure
Table 2: Graphs properties Diameter Radius 6 0 6 0 7 0 14 0 1569932175-3
Nodes 10495 39760 48300 91211
Links 34158 73254 102142 221677
Zaytoonah University International Engineering Conference on Design and Innovation in Sustainability 2014 (ZEC Infrastructure 2014), May 13-15, 2014 Amman, Jordan Paper Code. No. 1569932175
4
5
RESULTS
The generated graphs have been used as an input to our software to compute the eccentricity value of each node. To compute eccentricity, we implemented Dijkstra algorithm on the four graphs to extract the shortest path between each node and all other nodes. The complexity of our algorithm is O(n2.log(n)) which is high. However, in this work we did not focus on implementing an efficient algorithm to compute eccentricity.
Figure 2: Eccentricity of the Constructed Graphs Figure 2, demonstrates nodes’ eccentricity for the constructed graphs. We can observe from the figure the difference between the pure graph and the time line graphs. The high difference between the three time-line graphs and the pure graph is the result of the incomplete or missing data that has been collected and harvested from Citeseerx. The date of publication is missing for more than 30% of the collected papers. Nevertheless, the results demonstrated that eccentricity value of nodes increase with time. However, this increase is tiny which mean that ‘rich get richer’ and new nodes do not get new nodes or leaves for long time. This mean that the graph diameter growth with time. However, this growth is slightly small (6-7 over 10 years).
Figure 3: CDF of Node Eccentricity 1569932175-4
Zaytoonah University International Engineering Conference on Design and Innovation in Sustainability 2014 (ZEC Infrastructure 2014), May 13-15, 2014 Amman, Jordan Paper Code. No. 1569932175
5
Moreover, we can notice from the figure that the smallest value of eccentricity is zero. This means that our graph is not a fully connected graph and we have disconnected nodes or small graphs. These disconnected nodes or graphs are popular in graphs (2, 5 and 10). However, with the time, this number is reduced and the graph is converted into a fully connected graph as in the pure graph.
Figure 4: Relationship between Citation (Node In-degree) and Eccentricity Figure 3, shows the CDF value of eccentricity of nodes in the pure classic graph. From the figure we can observe that 50% of the nodes have eccentricity value of less than 5 hops. In addition, we can observe that less than 0.3% of the nodes have a very high eccentricity value over 10 hops. These nodes are new nodes that cite papers that are not old. These data proves that quality scientific papers will be cited even if they are new. However, older papers that have been published in the field will get the highest citation value. On other words rich get richer. 5
CONCLUSION
Graph theory is the niche of network science. Graph parameters and metrics have been utilized to study many aspects and phenomena in Networks. Power-law, small world, network growth and sustainably has been studied. In this work, we have studied the eccentricity of nodes in citation network. Citeseerx has been crawled. In addition, ad hoc papers have been harvested. Dijkstra has been implemented on the generated graph to compute eccentricity. Our results show that there is a connection between eccentricity value and small world phenomena. In addition, the growth pattern of eccentricity value follows rich get richer properties of small world networks. REFERENCES [1] M. Faloutsos, P. Faloutsos, and C. Faloutsos, “On power-law relationships of the internet topology,” in Proceedings of SIGCOMM’99, Cambridge, MA, September 1999. [2] P. Gill, M. Schapira, and S. Goldberg, “Modeling on quicksand: dealing with the scarcity of ground truth in interdomain routing data,” SIGCOMM Comput. Commun. Rev., Jan. 2012.
1569932175-5
Zaytoonah University International Engineering Conference on Design and Innovation in Sustainability 2014 (ZEC Infrastructure 2014), May 13-15, 2014 Amman, Jordan Paper Code. No. 1569932175
5
[3] M. Masoud, X. Hei, and W. Cheng, “A graph-theoretic study of the flattening internet as topology,” in IEEE ICON, Dec. 2013. [4] P. Mahadevan, D. Krioukov, M. Fomenkov, X. Dimitropoulos, k. c. claffy, and A. Vahdat, “The Internet AS-level topology: three data sources and one definitive metric,” SIGCOMM Comput. Commun. Rev., vol. 36, no. 1, pp. 17–26, Jan. 2006. [5] R. Albert, H. Jeong, and A.-L. Barabasi, “Diameter of the world-wide web,” Nature, pp. 130–131, 1999. [6] S. N. Dorogovtsev and J. F. F. Mendes, “Language as an evolving word web,” Proc. Roy.Soc. London Ser. B, 268, pp. 2603–2606, 1999. [7] K. Healy, “A co-citation network for philosophy,” June 2013. [8] Eccentricity, http://www.wikipedia.com/eccenticity [9] CiteSeerX, “http://citeseerx.ist.psu.edu/”
1569932175-6