Robust Communities Detection in Joint-patent Application Networks. 1 ... BUT: to date there is still little joint patenting activity between members/partners ..... 26. Edge-Cut Network: Consensus Matrix me me2 me3 me5 me6 me7 me8. 1. 1. 13.
Robust Communities Detection in Joint-patent Application Networks
Robust Communities Detection in Joint-patent Application Networks A Consensus Approach
Carlo Drago and Ivan Cucco
University of Naples “Federico II” Department of Economics and Statistics
INSNA Sunbelt 2013, Hamburg
1
Robust Communities Detection in Joint-patent Application Networks
Outline
Research Problem Method Application to a Joint Patenting Network Results and Discussion Conclusions and Directions for Future Research
Outline
2
Robust Communities Detection in Joint-patent Application Networks
Research Problem
Research Framework
This work is part of REPOS, a research project involving several universities in Italy and the University of Ljubljana Aims: to develop methologies for the evaluation of Network-Based Policies (NBP) in favour of innovation Emphasis on the effects of NBP on innovative networks Empirical focus on Italian government-sponsored technological districts (aerospace, biotechnologies, nanotech, new materials)
3
Robust Communities Detection in Joint-patent Application Networks
Research Problem
Research Framework
Several works within the project analyze cooperation networks among TD members and partners We are interested in integrating research on joint participation in TD project with the analysis of the patenting networks in which members/partners are involved before and after the establishment of TDs BUT: to date there is still little joint patenting activity between members/partners
4
Robust Communities Detection in Joint-patent Application Networks
Research Problem
Research Framework
We look therefore at larger patenting networks that include TD members/partners as well as their co-applicants These networks are relatively large (for example, the post-2004 patenting network for one of the districts includes about 6,000 nodes) Research Aim The aim is to detect the stable patenting communities to which TD members/partners belong, and to track their evolution over time
5
Robust Communities Detection in Joint-patent Application Networks
Research Problem
Community Structure
Community Structure A network has a community structure if the nodes can be grouped into sets which are densely connected with ”many edges joining vertices of the same cluster and comparatively few edges joining vertices of different clusters” (Fortunato 2010). In general: One of the main reason for detecting communities is to find the characteristic pattern of each community (related for example to specific node attributes) At the same time there are different functions related to the communities inside a network
6
Robust Communities Detection in Joint-patent Application Networks
Community Structure Figure : Community Structure: Karate Zachary Club (Zachary 1977)
Research Problem
7
Robust Communities Detection in Joint-patent Application Networks
Community Detection
”Complex systems are usually organized in compartments, which have their own role and or function”. ”In the network representation, such compartments appear as sets of nodes with a high density of internal links, whereas links between compartments have a comparatively lower density”. ”These subgraphs are called communities, or modules, and occur in a wide variety of networked systems”. See: Lancichenetti and Fortunato (2009), Girvan and Newman (2002) and Fortunato (2009).
Method
8
Robust Communities Detection in Joint-patent Application Networks
Community Structure Figure : Community Structure: Karate Zachary Club (Zachary 1977)
Method
9
Robust Communities Detection in Joint-patent Application Networks
Community Detection
Communities can be flat or separated, overlapping or nested in a hierarchical structure Community detection algorithms aim at identifying the modules and the hierarchical organization by considering only the graph topology (Fortunato 2007) Community detection techniques usually do not model the network but adopt an algorithmic approach in order to detect patterns in the network
Method
10
Robust Communities Detection in Joint-patent Application Networks
Community Detection Algorithms
A relevant problem in literature is to identify communities in a network when: their exact number is unknown the communities can be characterized by unequal sizes and densities
Method
11
Robust Communities Detection in Joint-patent Application Networks
Community Detection Algorithms
Various algorithms and methods have been proposed for accomplishing this task. The choice of the algorithm is however problematic since: In an explorative framework where no apriori information is available on the communities in the network, the choice of the “right” algorithm can be unfeasible Different methods show different performances and can suffer from different biases (Leskovec Lang Mahoney 2010) Each method seems to be more appropriate for some specific network typologies The partitions generated by different methods do not necessarily match (Good Montjoye Clauset 2010) When a given method produces several outputs, it is dificult to consider a single partition as being more representative of the actual community structure (Lancichenetti Fortunato 2012)
Method
12
Robust Communities Detection in Joint-patent Application Networks
Community Detection Algorithms
We use a consensus algorithm to assess the stability of the detected communities Drago and Balzanella (2013) propose to use an ensemble of community detection algorithms, to then find a consensus partition which allows to combine the information produced by various community detection methods. We apply different ensambles of methodologies on the same relational data and use statistical procedures to evaluate the level of agreement between the different procedures (see Lancichenetti and Fortunato 2012) As a first illustration, the methodology is applied to a joint patent application network drawn from the OECD Regpat database
Method
13
Robust Communities Detection in Joint-patent Application Networks
Consensus Community Detection Algorithm
We start from the network adjacency matrix
Method
14
Robust Communities Detection in Joint-patent Application Networks
Consensus Community Detection Algorithm
We start from the network adjacency matrix We consider an ensamble of different community detection algorithms and we obtain different results
Method
14
Robust Communities Detection in Joint-patent Application Networks
Consensus Community Detection Algorithm
We start from the network adjacency matrix We consider an ensamble of different community detection algorithms and we obtain different results We collect the results in the Consensus Matrix
Method
14
Robust Communities Detection in Joint-patent Application Networks
Consensus Community Detection Algorithm
We start from the network adjacency matrix We consider an ensamble of different community detection algorithms and we obtain different results We collect the results in the Consensus Matrix We apply Multiple Correspondence Analysis (Le Roux, Rouanet 2009) on the Consensus Matrix
Method
14
Robust Communities Detection in Joint-patent Application Networks
Consensus Community Detection Algorithm
We start from the network adjacency matrix We consider an ensamble of different community detection algorithms and we obtain different results We collect the results in the Consensus Matrix We apply Multiple Correspondence Analysis (Le Roux, Rouanet 2009) on the Consensus Matrix We perform a Hierarchical Cluster Analysis on the first two dimension (the most relevant), using the Euclidean Distance and the Ward Method (H¨ ardle & Simar 2007)
Method
14
Robust Communities Detection in Joint-patent Application Networks
Consensus Community Detection Algorithm
We start from the network adjacency matrix We consider an ensamble of different community detection algorithms and we obtain different results We collect the results in the Consensus Matrix We apply Multiple Correspondence Analysis (Le Roux, Rouanet 2009) on the Consensus Matrix We perform a Hierarchical Cluster Analysis on the first two dimension (the most relevant), using the Euclidean Distance and the Ward Method (H¨ ardle & Simar 2007) We use a dendrogram to explore the different partitions
Method
14
Robust Communities Detection in Joint-patent Application Networks
The Algorithm
We use several well-known methods in community detection: Edge Betweenness community (Clauset Newman Moore 2004) Walktrap community (Pons Latapy 2005) Fastgreedy community (Clauset Newman Moore 2004) Spinglass community (Sathik Rasheed) Leading Eigenvector Community (Newman 2006) Infomap Community (Rosvall Axelsson and Bergstrom 2009) Label Propagation (Raghavan Albert Kumara) Blockmodeling as a tool in Community Detection (Zhao Levina and Zhu 2011 and Karrer Newman 2011)
Method
15
Robust Communities Detection in Joint-patent Application Networks
The Algorithm
The Consensus Matrix is useful to observe the comparisons between the partitions
Method
16
Robust Communities Detection in Joint-patent Application Networks
The Algorithm
The Consensus Matrix is useful to observe the comparisons between the partitions The factor map related to the methods is useful to identify the different patterns between different community detection methods A relevant problem could be to identify the stable communities that it is possible to detect by utilizing an ensamble of different methods
Method
16
Robust Communities Detection in Joint-patent Application Networks
The Algorithm
The Consensus Matrix is useful to observe the comparisons between the partitions The factor map related to the methods is useful to identify the different patterns between different community detection methods A relevant problem could be to identify the stable communities that it is possible to detect by utilizing an ensamble of different methods The factor map related to the nodes allows to identify the different communities (the nodes).
Method
16
Robust Communities Detection in Joint-patent Application Networks
The Algorithm
Finally it is possible to obtain the different clusters by using an appropriate distance (Euclidean distance) by performing a cluster analysis using the Ward method
Method
17
Robust Communities Detection in Joint-patent Application Networks
The Algorithm
Finally it is possible to obtain the different clusters by using an appropriate distance (Euclidean distance) by performing a cluster analysis using the Ward method The number of clusters is decided in order to explore different partitions (the approach followed is exploratory). The final result is the detection of different clusters which represent the stable communities
Method
17
Robust Communities Detection in Joint-patent Application Networks
Application to a Joint-Patent Networks
Applying the Method on Joint-Patent Networks
To illustrate the application of CDTs to innovation networks, a joint patent application network was constructed starting from the three Italian branches of a leading firm (electronics) We present the preliminary results of the analysis and describe the next steps
18
Robust Communities Detection in Joint-patent Application Networks
Application to a Joint-Patent Networks
Data
Source: OECD Regpat database, which reports patent applications to the European Patent Office and applications filed under the Patent Cooperation Treaty Data Structure Originally two mode data (applicant(s) - patent) Projected onto a one mode applicant-applicant network
19
Robust Communities Detection in Joint-patent Application Networks
Application to a Joint-Patent Networks
The Network
All patent applications filed by the three branches of the firm after 1990 were extracted
20
Robust Communities Detection in Joint-patent Application Networks
Application to a Joint-Patent Networks
The Network
All patent applications filed by the three branches of the firm after 1990 were extracted A first node set was created by listing all the co-applicants (firms, universities, research institutes) reported on the extracted applications
20
Robust Communities Detection in Joint-patent Application Networks
Application to a Joint-Patent Networks
The Network
All patent applications filed by the three branches of the firm after 1990 were extracted A first node set was created by listing all the co-applicants (firms, universities, research institutes) reported on the extracted applications 64 individual nodes were identified in this step using (a) harmonized names in OECD Harmonized Applicants Names database; (b) manual checks for ambiguous cases
20
Robust Communities Detection in Joint-patent Application Networks
Application to a Joint-Patent Networks
The Network
All patent applications filed by the three branches of the firm after 1990 were extracted A first node set was created by listing all the co-applicants (firms, universities, research institutes) reported on the extracted applications 64 individual nodes were identified in this step using (a) harmonized names in OECD Harmonized Applicants Names database; (b) manual checks for ambiguous cases All patent applications filed by the identified actors were extracted
20
Robust Communities Detection in Joint-patent Application Networks
Application to a Joint-Patent Networks
The Network
All patent applications filed by the three branches of the firm after 1990 were extracted A first node set was created by listing all the co-applicants (firms, universities, research institutes) reported on the extracted applications 64 individual nodes were identified in this step using (a) harmonized names in OECD Harmonized Applicants Names database; (b) manual checks for ambiguous cases All patent applications filed by the identified actors were extracted The process was repeated, resulting in a final node set of 1,703 nodes
20
Robust Communities Detection in Joint-patent Application Networks
Application to a Joint-Patent Networks
Edge Cuts
All joint patent applications between the 1703 nodes were extracted and transformed into a one-mode valued network (applicant-applicant) Edge weights equal to the number of joint patent applications To remove occasional collaborations, we operated an edge-cuts on the network (less than five collaborations) Isolates were removed, and the networks were binarized
21
Robust Communities Detection in Joint-patent Application Networks
Application to a Joint-Patent Networks
Edge-Cut Network (at least 5 joint applications)
Network Descriptives We start from the general features of the network, described by their descriptives then we explore the network to find the stable communities.
vertices edges density diameter centralization degree betweenness
Edge Cut Network 216 248 0.01 526.00 0.26 2.30 390.52
22
Robust Communities Detection in Joint-patent Application Networks
Application to a Joint-Patent Networks
Edge-Cut Network (at least 5 joint applications; 216 nodes) Figure : Edge-Cut Network
23
Robust Communities Detection in Joint-patent Application Networks
Edge-Cut Network: communities Figure : Edge Cut Network
Application to a Joint-Patent Networks
24
Robust Communities Detection in Joint-patent Application Networks
Edge-Cut Network: communities Figure : Edge Cut Network
Application to a Joint-Patent Networks
25
Robust Communities Detection in Joint-patent Application Networks
Application to a Joint-Patent Networks
Edge-Cut Network: Consensus Matrix
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 ...
me 1 2 3 4 4 4 1 5 5 5 4 4 6 5 5 5 ...
me2 13 3 8 6 6 6 2 6 5 5 6 6 1 5 5 5 ...
me3 1 7 2 3 3 3 1 3 5 5 3 3 3 5 5 5 ...
me5 1 2 9 14 14 14 1 10 10 10 14 14 13 10 10 10 ...
me6 10 13 2 8 8 8 10 8 11 11 8 8 8 11 11 11 ...
me7 16 6 3 1 1 1 5 1 2 2 1 1 8 2 2 2 ...
me8 1 2 3 4 4 4 5 4 6 6 4 4 4 6 6 6 ...
26
Robust Communities Detection in Joint-patent Application Networks
Application to a Joint-Patent Networks
Edge-Cut Network: Consensus Matrix
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 ...
me 1 2 3 4 4 4 1 5 5 5 4 4 6 5 5 5 ...
me2 13 3 8 6 6 6 2 6 5 5 6 6 1 5 5 5 ...
me3 1 7 2 3 3 3 1 3 5 5 3 3 3 5 5 5 ...
me5 1 2 9 14 14 14 1 10 10 10 14 14 13 10 10 10 ...
me6 10 13 2 8 8 8 10 8 11 11 8 8 8 11 11 11 ...
me7 16 6 3 1 1 1 5 1 2 2 1 1 8 2 2 2 ...
me8 1 2 3 4 4 4 5 4 6 6 4 4 4 6 6 6 ...
27
Robust Communities Detection in Joint-patent Application Networks
Application to a Joint-Patent Networks
Edge-Cut Network: Consensus Matrix
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 ...
me 1 2 3 4 4 4 1 5 5 5 4 4 6 5 5 5 ...
me2 13 3 8 6 6 6 2 6 5 5 6 6 1 5 5 5 ...
me3 1 7 2 3 3 3 1 3 5 5 3 3 3 5 5 5 ...
me5 1 2 9 14 14 14 1 10 10 10 14 14 13 10 10 10 ...
me6 10 13 2 8 8 8 10 8 11 11 8 8 8 11 11 11 ...
me7 16 6 3 1 1 1 5 1 2 2 1 1 8 2 2 2 ...
me8 1 2 3 4 4 4 5 4 6 6 4 4 4 6 6 6 ...
28
Robust Communities Detection in Joint-patent Application Networks
Edge-Cut Network: MCA, methods Figure : Edge-Cut Network: MCA, methods
Application to a Joint-Patent Networks
29
Robust Communities Detection in Joint-patent Application Networks
Edge-Cut Network: MCA, nodes Figure : Edge Cut Network: MCA, nodes
Application to a Joint-Patent Networks
30
Robust Communities Detection in Joint-patent Application Networks
Edge-Cut Network: dendrogram Figure : Edge Cut Network: dendrogram
Application to a Joint-Patent Networks
31
Robust Communities Detection in Joint-patent Application Networks
Application to a Joint-Patent Networks
Edge-Cut Network: stable communities Figure : Edge Cut Network: stable communities
32
Robust Communities Detection in Joint-patent Application Networks
Application to a Joint-Patent Networks
Edge-Cut Network: stable communities Figure : Edge Cut Network: stable communities
33
Robust Communities Detection in Joint-patent Application Networks
Application to a Joint-Patent Networks
Preliminary Results
The three branches from which we started the construction of the network are in the same community together with some of their district members and patners.
34
Robust Communities Detection in Joint-patent Application Networks
Application to a Joint-Patent Networks
Preliminary Results
The three branches from which we started the construction of the network are in the same community together with some of their district members and patners. In this community there is also an Italian university.
34
Robust Communities Detection in Joint-patent Application Networks
Application to a Joint-Patent Networks
Preliminary Results
The three branches from which we started the construction of the network are in the same community together with some of their district members and patners. In this community there is also an Italian university. All other Italian universities belong to a separate community in which there are no private firms.
34
Robust Communities Detection in Joint-patent Application Networks
Application to a Joint-Patent Networks
Preliminary Results
The three branches from which we started the construction of the network are in the same community together with some of their district members and patners. In this community there is also an Italian university. All other Italian universities belong to a separate community in which there are no private firms. This is relevant, because the technological district policies had among its aims the cooperation between universities and firms. It is one of the points we should look at in more detail, when we apply the methodology to all the firms in the district
34
Robust Communities Detection in Joint-patent Application Networks
Application to a Joint-Patent Networks
Preliminary Results
The three branches from which we started the construction of the network are in the same community together with some of their district members and patners. In this community there is also an Italian university. All other Italian universities belong to a separate community in which there are no private firms. This is relevant, because the technological district policies had among its aims the cooperation between universities and firms. It is one of the points we should look at in more detail, when we apply the methodology to all the firms in the district Incidentally, the remaining communities show strong geographical patterns
34
Robust Communities Detection in Joint-patent Application Networks
Application to a Joint-Patent Networks
Advantages in using the method
There are clear advantages in using ensambles when different methods produce different information about communities: Measure the persistence of the co-participation of some nodes in the same community
35
Robust Communities Detection in Joint-patent Application Networks
Application to a Joint-Patent Networks
Advantages in using the method
There are clear advantages in using ensambles when different methods produce different information about communities: Measure the persistence of the co-participation of some nodes in the same community Overcome the biases of each method
35
Robust Communities Detection in Joint-patent Application Networks
Application to a Joint-Patent Networks
Advantages in using the method
There are clear advantages in using ensambles when different methods produce different information about communities: Measure the persistence of the co-participation of some nodes in the same community Overcome the biases of each method Understand what each method concretely tells us
35
Robust Communities Detection in Joint-patent Application Networks
Conclusions and Directions for Future Research
Conclusions and Future Extensions
Next steps in the application to the empirical research problem:
36
Robust Communities Detection in Joint-patent Application Networks
Conclusions and Directions for Future Research
Conclusions and Future Extensions
Next steps in the application to the empirical research problem: Apply the method to networks elicited from all members of a TD
36
Robust Communities Detection in Joint-patent Application Networks
Conclusions and Directions for Future Research
Conclusions and Future Extensions
Next steps in the application to the empirical research problem: Apply the method to networks elicited from all members of a TD Perform substantive analysis on the identified communities (node attributes)
36
Robust Communities Detection in Joint-patent Application Networks
Conclusions and Directions for Future Research
Conclusions and Future Extensions
Next steps in the application to the empirical research problem: Apply the method to networks elicited from all members of a TD Perform substantive analysis on the identified communities (node attributes) Identify the changes in community membership for TD members before and after the implementation of government NBP
36
Robust Communities Detection in Joint-patent Application Networks
Conclusions and Directions for Future Research
Conclusions and Future Extensions
Next steps in the application to the empirical research problem: Apply the method to networks elicited from all members of a TD Perform substantive analysis on the identified communities (node attributes) Identify the changes in community membership for TD members before and after the implementation of government NBP Analyze whether TD project partners tend, in time, to be located in the same patenting communities
36
Robust Communities Detection in Joint-patent Application Networks
Conclusions and Directions for Future Research
Conclusions and Future Extensions
Next steps in the application to the empirical research problem: Apply the method to networks elicited from all members of a TD Perform substantive analysis on the identified communities (node attributes) Identify the changes in community membership for TD members before and after the implementation of government NBP Analyze whether TD project partners tend, in time, to be located in the same patenting communities Differentiate across sectors (according to IPC codes)
36
Robust Communities Detection in Joint-patent Application Networks
Conclusions and Directions for Future Research
References A.V (2012) Evidence of Networking in the European Research Area. Project financed by the 6th Framework Programme for Research, for the implementation of the specific programme ?Strengthening the Foundations of the European Research Area? (Invitation to tender n DG RTD 2005 M 02 02) Burger-Helmchen, T. (Ed.). (2013). The Economics of Creativity: Ideas, Firms and Markets (Vol. 60). Routledge. Christ, J. (2009). The Geography and Co-Location of European Technology-Speci fic Co-Inventorship Networks. University of Hohenheim FZID Discussion Paper, (14-2010). Drago C. (2012) Stable Communities Detection. Mimeo Girvan M. and Newman M. E. (2002), Proc. Natl. Acad. Sci. USA 99, 7821. Fortunato S. (2009), arXiv:0906.0612. Fortunato, S. (2010) Community detection in graphs. Physics Reports, 486(3), 75-174. Fortunato, S. (2013) Community structure in networks. Institute for Scientific Interchange Foundation Fortunato, S., & Castellano, C. (2007). Community structure in graphs. arXiv preprint arXiv:0712.2716. H¨ ardle, W., & Simar, L. (2007). Applied multivariate statistical analysis. Springer Verlag.
37
Robust Communities Detection in Joint-patent Application Networks
Conclusions and Directions for Future Research
References Lancichinetti, A., & Fortunato, S. (2009). Community detection algorithms: A comparative analysis. Physical review E, 80(5), 056117. Lancichinetti, A., & Fortunato, S. (2012). Consensus clustering in complex networks. Scientific reports, 2. Lancichinetti A., Radicchi F., Ramasco J.J. and Fortunato S. (2011) Finding statistically significant communities in networks. PloS One 6, e18961 Le Roux, B., & Rouanet, H. (2009). Multiple correspondence analysis (Vol. 163). SAGE Publications, Incorporated. Mascolo C. (2013) Lecture 4: Modularity and Overlapping Communities. Lecture Notes Cambridge University Moradi, F., Olovsson, T., & Tsigas, P. (2012). An evaluation of community detection algorithms on large-scale email traffic. In Experimental Algorithms (pp. 283-294). Springer Berlin Heidelberg Newman, M. E.J. (2006). Modularity and community structure in networks. Proceedings of the National Academy of Sciences, 103(23), 8577-8582. Newman, M.E.J. (2013) Modularity, Community, Structure and Spectral Properties of Networks. Preprint physics 0602124 (PNAS in press) Newman M.E.J. & Girvan G. (2004) Finding and evaluating community structure in networks. Phys. Rev E 69, 026113 Tang, L., & Liu, H. (2010). Graph mining applications to social network
38