Filtered Patent Maps for Predicting Diversification Paths of Inventors and Organizations
Bowen Yan Singapore University of Technology and Design SUTD-MIT International Design Centre Email:
[email protected] Jianxi Luo Singapore University of Technology and Design SUTD-MIT International Design Centre Email:
[email protected] Abstract In a patent technology network map, almost all pairs of technology classes are connected, whereas most of the connections are extremely weak. This observation suggests the need and also the possibility to filter the network map by removing the negligible and noisy links. But link removal may reduce the power of the network for predicting the cross-field patent portfolio diversification of inventors and inventing organizations. This paper proposes a metric for such predictive power of a patent network, and a method that allows one to objectively choose a best tradeoff between predictive power and the removal of weak links. We show the results that identify filtered networks below the optimal tradeoff, and also remove a degree of arbitrariness compared with other filtering treatments from the literature. On that basis, we further demonstrate the use of filtered technology maps to visualize and analyze the main paths of patent portfolio diversification of a prolific inventor (Leonard Forbes) and a technology company (Google Inc.).
1
1 Introduction Recent studies have proposed to represent the technology space as a network built from patent data (Kay et al., 2014; Leydesdorff et al., 2014), in which nodes are patent technology classes representing diverse technology fields and links among them are weighted according to the knowledge relatedness or proximity between the technology fields (Engelsman and van Raan, 1994; Joo and Kim, 2010). In turn, such technology networks have been used to analyze the patent portfolio diversification or capability position movements of firms or regions across different technology fields in the technology space (Boschma et al., 2015; Breschi et al., 2003; Jaffe and Trajtenberg, 2002; Rigby, 2013; Kogler et al., 2013). These studies have generally shown that firms or regions tend to diversify into new technology fields that have high knowledge relatedness with their existing fields (Teece et al., 1994; Breschi et al., 2003; Nesta and Dibiaggio, 2005). In this paper, we aim to develop a proper technology network map whose links can well predict the likelihoods of diversification of innovation agents across the links, and also apply it to identifying the main paths of diversification of innovation agents in the total technology network map. In particular, it is found that the patent technology networks are almost fully connected, but most of connections are extremely weak. The presence of many weak links indicates an opportunity to remove them and derive a filtered network that can reveal a clear network structure for effective visualization and analysis of cross-field diversification. However, which links and how many links should be removed remains a research question. The necessary network filtering principle and method are still underdeveloped. In this paper, we introduce a method to filter the technology network, based on two main principles. The first principle is to remove as many weak links as possible. The second one is to maintain the overall power of the filtered network to predict the diversification of innovation agents (e.g., inventors, technology firms, regions) across technology fields throughout the network. Such prediction is the most important utility of the technology network map. However, these two principles conflict, because removing network links will most likely reduce the 2
network’s prediction power. Trade-offs must be made. Our method aims to reconcile this conflict to generate significantly filtered technology networks that maintain sufficiently high prediction power on diversification patterns of different types of innovation agents. We apply the method to filtering the original technology networks for the interests of predicting cross-field diversification of two types of innovation agents, i.e. inventors and R&D organizations, and show the superior diversification power of the filtered networks in comparison to others using alternative network filtering techniques from the literature. On that basis, in two case studies we further demonstrate the use of the filtered technology network maps to visualize and analyze the main diversification paths of a prolific inventor, i.e. Leonard Forbes, and a technology company, i.e. Google.
2. Literature Review 2.1 Inter-field Relatedness and Diversification Prior empirical studies have shown that firms and regions are more likely to diversify across technology areas with high knowledge relatedness. For instance, by examining the patterns of technological diversification of the firms in a few most developed countries, according to their records of patenting to the European Patent Office from 1982 to 1993, Breschi et al. (2003) found knowledge relatedness of technology fields is a key factor to determine firms’ technological diversification across fields. They used co-classification codes contained in patent documents to measure knowledge relatedness between technology classes. At the city level, Rigby (2013) showed that US cities’ entries into and exits from technology fields are highly related to the level of relatedness among technology fields. He used USPC (United States Patent Classification) classes to represent technology fields, and measured inter-field relatedness as the probability that a patent in class j will cite a patent in class i. Boschma et al. (2015) also found similar evidence that the probability for U.S. cities to enter a new technology field is significantly related to its level of relatedness with existing technology fields that the city has entered. In general, these studies have suggested that the network maps of patent technology 3
classes and measures of relatedness of these classes can be used to predict the likelihoods of diversification of firms and regions across pairs of technology classes. 2.2 Technology Network Maps and Filtering Several recent studies have constructed and visualized patent technology networks for such a purpose. For instance, Leydesdorff et al. (2014) constructed a technology network map, using IPC classes of USPTO patents as nodes. The weight of a link between two technology classes is measured using cosine similarity (Jaffe, 1986), i.e. the angular cosine value of the two vectors of citations from the patents of these two classes to all other classes respectively. When visualizing the network and detecting its community structure in Pajek, Leydesdorff et al. (2014) suggested that, “without a threshold for the cosine, the visualization is not informative.” However, they only reported the community detection results, given the threshold value of cosine > 0.2. That is, inter-field links with relatedness 0.55 on the maximum spanning tree (MST). A MST has the minimal set of strongest links that keeps the network connected (i.e., MST has specifically n-1 links, n is the total number of nodes in the network). Their choice of the threshold of 0.55 is determined based on the “rule of thumb” that a good network visualization has an average degree equal to 4 (or the number of links is twice of the number of nodes). The impact of this network filtering for visualization ease on the utility of the network to predict cross-field diversification is not clear. In sum, the patent mapping literature has suggested that weighted patent technology networks can be used to predict technology diversification of innovation agents, such as firms and cities. And, such networks are highly connected, and need to be filtered by removing the weakest or negligible links. The patent network mapping efforts reviewed above all conducted network filtering, while using different techniques. For network filtering, a threshold of link weight needs to be chosen, so links with values weaker than the threshold are removed. The threshold provides a stopping criterion of link removal. However, in the literature, the threshold was often chosen either arbitrarily or for the sake of visualization. In this study, we will present a method to decide the threshold of network filtering for the interest of using the technology network to predict diversification of innovation agents.
3. The Technology Network 3.1 Constructing the Technology Network 5
As others, we start with constructing a network of patent technology classes to represent the technology space. The 3-digit IPC (International Patent Classification) categories are used to represent technology fields as nodes in the network, following many other authors considering IPC classes the most suitable and stable representations of technology fields (Leydesdorff et al., 2014). The USPTO patent database is used as our patent data source. After removing a few undefined and empty classes, the resulting network has 121 nodes, which together contain 3,911,054 US patents from 1976 to 2010. To construct a patent technology network requires a measure of the relatedness between pairs of technology classes, which determine the weights of links in the network. We choose the “normalized co-reference” index, which is calculated as the number of shared references of the patents in a pair of classes normalized by the number of unique references of patents in either class. Its mathematical form is Co-Reference
Ci
Cj
Ci
Cj
(1)
where Ci and Cj are the numbers of backward citations (i.e., references) of patents in technology classes i and j; Ci
Cj
Ci
Cj
is the number of patents referenced in both technology classes i and j, and
is the total number of unique patents referenced in either technology classes i or j,
respectively. Thus, the index value is between [0,1], and indicates the similarity of the knowledge bases of a pair of fields defined by IPC technology classes. It is a variant of the Jaccard index (Jaccard, 1901; Small, 1973). Here we choose the normalized co-reference index to construct the network for later filtering analysis, because, as we will later show, this measure actually provides better predictive power of the resulting network on the diversification of innovation agents across fields in the network, than other measures of knowledge relatedness of patent technology classes, particularly cosine similarity which has been a dominant choice for this purpose (Kay et al., 2014; Leydesdorff et al., 2014).
6
3.2 Diversification Predictive Power A patent technology network’s predictive power on the diversification of innovation agents across technology fields (i.e. nodes) in the network is assessed as the Pearson correlation coefficient (PCC) between the knowledge relatedness values of all pairs of technology fields and the likelihoods that innovation agents diversify across the respective pairs of fields. The predictive power corresponds to the types (or levels) of innovation agents of interest. In this paper, we focus on individual inventors and inventive organizations (which can be technology firms, universities or R&D organizations). We calculate “inventor diversification likelihood” as the minimum of the pairwise conditional probabilities of an inventor having a patent in one technology class, given that this person also has patents in the other class in a given time period. To calculate this measure, we used the unique inventor identifiers from the Institute for Quantitative Social Science at Harvard University (Li et al., 2014). “Organization diversification likelihood” is calculated as the minimum of the pairwise conditional probabilities of an organization having a patent in one technology class, given that it also has patents in the other in a given time period. To calculate this measure, we used the unique assignee identifiers created by the National Bureau of Economics Research (NBER) (Hall et al., 2001). Figure 1 shows the correlations between the co-reference relatedness values of all pairs of 121 technology fields (represented by IPC patent classes) and the likelihoods for an average inventor or organization to diversify across respective pairs of technology fields. Specifically, using co-reference relatedness to determine link weights, the technology network’s power to predict inventor diversification is 0.927 (Pearson correlation coefficient), and its power to predict organization diversification is 0.649. In comparison, when cosine similarity is used as link weights, the network’s predictive power for inventor and organization diversifications is 0.407 and 0.226 respectively. For a robustness check, we also compared the predictive powers of coreference and cosine relatedness measures using data in shorter time periods. The co-reference measure has consistently stronger predictive power than the cosine measure.
7
Figure 1. The technology network mirrors the likelihood of inventor and organization diversification. Each dot represents a pair of technology classes. Both relatedness and diversification likelihoods values are calculated based on all patent records from 1976 to 2010.
Note that there are still alternative measures of knowledge relatedness. Other scholars have done focused research on the comparison and choices of relatedness measures (Joo and Kim, 2010; Klavans and Boyack, 2006b). We have also conducted an comparative analysis of 12 relatedness measures, and the results suggest normalized co-reference is a superior choice (Bowen and Luo, 2015). In the present paper, our focus is on network filtering given a specific technology network. Therefore, we will use the technology network with link weights measured by “normalized co-reference” in the later analysis. 3.3 Relatedness-Diversification Correlations for Inventors and Organizations Figure 1 also presents several noteworthy patterns. First, the relatedness—diversification correlation is generally stronger for individual inventors than organizations. This difference implies that cross-field diversification and related decisions are less constrained by inter-field knowledge relatedness for organizations than for individuals. A possible explanation is that organizations can invest and acquire knowledge more flexibly than individual inventors, whereas inventors do not have a wide scope of resources and must learn and master relevant knowledge
8
of a technology field in order to invent there. Such a difference in resources and capacities of organizations and individuals may have also resulted in our second observation in Figure 1— where knowledge relatedness is lower, implying stronger constraints to diversification, inventors are clearly less prone to diversify than organizations. In addition, for both organizations and inventors, the scattered dots are bounded in a “dagger” area pointing to the upper right. This pattern further suggests that extremely high relatedness can enable organizations and individuals to diversify similarly, resulting in smaller variance in diversification behaviors. 3.4 Structure of the Technology Network Figure 2 visualizes the full technology network with all 7,195 none-zero links. Among the total 121×120/2=7,260 pairs of technology classes, only 65 or 0.9% of them have zero link weight (i.e. relatedness value) and no direct connection. Thus, the network is almost fully connected and extremely dense.
Figure 2. The full technology relatedness network with all non-zero links. Several clusters of technology classes were identified using the Louvain community detection method (Blondel et al., 2008) and colored accordingly.
Figure 3 is the matrix visualization of the same network. Each row and column of the matrix denotes a technology class, and the brightness level of the entry in row i and column j 9
denotes the relatedness value between technology classes i and j. The rows and columns, representing technology classes, are sorted in the same order using the “average linkage clustering” algorithm to reveal several relatively dense areas in the network. Most importantly, the matrix visualization clearly reveals very large sparse areas (in dark blue colour), in which technology classes are only weakly connected with one another, despite the network’s almost fully connected status shown in Figure 2. That is, most pairs of technology classes are only weakly related.
Figure 3. Matrix representation of the technology network
Figure 4 further confirms the existence of a vast majority of extremely weak links in the network by showing the link distribution highly skewed to low relatedness values. Out of the total 7,260 theoretically possible links among 121 technology classes, 0.9% of them are equal to zero, 82.5% of them are smaller than 0.01, 99.7% of them have weights below 0.1, and the maximum weight is 0.243.
10
Figure 4. Distribution of links by weight
In sum, the technology class network is almost fully connected; but most links are extremely weak. This fact suggests the possibility to simplify the network by removing many of the negligible links. But, which and how many links should be removed? What is the threshold to cut off the links with values below it? In the following, we introduce our method to determine the threshold value, and the criterion for link removal stopping.
4. Network Filtering The purpose of network filtering is to remove as many weak links as possible. But link removal must be done under the constraint that the predictive power of the filtered network needs to be as high as possible. The main utility of the technology network for our purposes is to predict the likelihoods of inventor and organization diversification across the links between pairs of technology fields according to respective link weights. Figure 1 has shown that the original technology network based on the co-reference measure of link weights has a high predictive power. Removing too many links may reduce the prediction power of the filtered network. However, it is possible that the network predictive power can be improved when removing the weakest links that introduce noise and impairs the prediction. We will show such a case later. It
11
is also possible that the predictive power can remain stable or decrease only slightly when certain amount of negligible links is removed. In the following, we present the procedure to identify the link removal threshold for filtering networks so that they effectively predict the diversification of inventors and organizations across network links. Our filtering method starts with identifying the maximum spanning tree (MST) of the original full technology network, and then incrementally adds the rest of links, one by one, in the order of decreasing weights to the MST. Every time when a link is added back, calculate the predictive power of the updated network. When the desired predictive power is reached for the first time, the filtered network at this point contains the smallest number of links to obtain that predictive power. The first step of our method is to derive the skeleton of the technology network as a Maximum Spanning Tree (MST). Figure 5 is the MST for the original technology network shown in Figure 2. The MST only includes 120 links that connect all 121 technological classes in the network and maximizes the sum of the link weights. The MST guarantees that all 121 technological classes are minimally connected by the strongest 120 possible links among them.
12
Figure 5. Maximum Spanning Tree. Vertex sizes correspond to the numbers of patents in respective IPC patent classes; vertex colors denote communities identified by the Louvain community detection method (Blondel et al., 2008)
Then, we add back to the MST one additional link at a time in the order of decreasing link weight, until reaching the original full network of 7,195 links. Every time a weaker link is added to the network, we calculate the technology network’s predictive power, i.e. Pearson correlation coefficient between the filtered network’s link weights (i.e. co-reference value) and the likelihoods of inventors and organizations diversifying crossing corresponding links. Figure 6 traces the predictive power of the technology network for inventor and organization diversification respectively, as we add more and more links back to the MST. The left most point of each curve is the MST with 120 links. The right most point is the original network that includes all 7,195 links. As shown in Figure 6, the curve of predictive power for inventor diversification is generally above that for organization diversification. As discussed earlier, 13
organizations’ cross-field diversifications are less constrained (thus less predictable) by interfield relatedness than individual inventors. These two curves show the tradeoff between predictive power and link removal: link reduction incurs the loss of predictive power, whereas predictive power increases at the cost of the inclusion of additional links in the network.
Figure 6. The change of predictive power with the increase of links added to MST.
For the filtered network to predict inventor diversification, predictive power increases as more links with lower relatedness values are added back to the network (along the upper curve in Figure 6). The left most point, i.e. MST, has the lowest predictive power of 0.66, indicating that the network filtered at this level does not provide sufficient pathways for inventors to diversify across. The predictive power climbs up to 0.920 rapidly when the strongest 31.2% of the links (i.e. the strongest 2,242 links) have been included in the network. The highest possible predictive power is 0.927. After that, the increase in predictive power with adding more links almost ceased.
14
In other words, adding more links is not useful for increasing the network’s predictive power. This specific curve shape indicates the filtered network only needs to include a small fraction of strong links to reach a rather high level of predictive power. For example, to reach a predictive power of 0.9 (96.8% of the total power), the filtered network only needs to include the 120 MST links plus the additional next-strongest 697 links, totaling 817 links (11.4% of the original network links). This is the “star” point on the upper curve in Figure 6. Figure 7 visualizes this filtered network with 817 links. A vast majority of links has been removed without significantly affecting the network’s predictive power. The structure of the filtered network is more apparent than the full network in Figure 2.
Figure 7. Filtered network for the use of predicting inventor diversification. The network includes 120 MST links plus additional 697 strongest links of the original network (i.e., 817 links in total), and has a predictive power of 0.9. Vertex sizes correspond to the numbers of patents in respective IPC patent classes; vertex colors denote communities identified by the Louvain community detection method.
For the filtered network to predict organization diversification, predictive power increases initially as more links with lower levels of relatedness are added back to MST (along the lower curve in Figure 6). After approximately 61.5% or 4,428 links have been added back to MST, the increase of predictive power has reached the peak value of 65.1%. After the highest point, the
15
predictive power actually decreases when additional weaker links are included into the network. This suggests that the weakest 38.5% links do not contribute to prediction accuracy, but actually introduce noise to reduce organization diversification prediction power of the network. Therefore, a filtered network without the weakest 38.5% links can actually have higher predictive power than the original full network, which was 0.649. This result suggests link removal does not always reduce predictive power of the network, but sometimes improves it. In general, the specific shape of the curve in Figure 6 indicates that a vast majority of links of the original network can be removed without significantly losing its predictive power on organization diversification. For example, if one can afford the predictive power to drop slightly from 0.649 to 0.6, the weakest 6,112 links can be removed from the network. As a result, it only includes the 120 links in MST plus additional 963 strongest links, totaling 1,083 links, i.e. 15.1% of the total number of original links. This is the “start” point on the lower curve in Figure 6. Figure 8 visualizes this filtered network with 1,083 links for predicting organization diversification, which exhibits a more clear structure than that of the full network in Figure 2. Figure 8 looks similar to Figure 7, because it only adds 266 weaker links.
Figure 8. Filtered network for predicting organization diversification. The network includes 120 MST links plus additional 963 strongest links of the original network, and a predictive power of 0.60. 16
Such filtering can be tuned up to any desired level of diversification predictive power by adjusting the amount of strongest links to add back to the MST. One can make trade-offs between prediction power and link removal, following along the curves in Figure 6. Therefore, depending on different interests and needs, varied filtering strategies may be reasonable. In the following, we compare the diversification prediction powers of the filtered networks in Figures 7 and 8 with those from alternative network-filtering methods used by Hidalgo et al. (2007) and Klavans and Boyack (2006a) which we reviewed in section 3, as well as Maximum Spanning Tree (MST) and Planar Maximally Filtered Graph (PMFG) which are two general methods of extracting the most representative set of links from a network. Details of the networks filtered by the above-mentioned methods are as follows: A) Maximum Spanning Tree (MST) contains the minimal set of the strongest links that keeps all the nodes of the original network connected as a single component. In the context of our paper, the minimum number of links required to connect 121 nodes is 120. The 120 links also need to have the maximized sum of total weights. Figure 5 is the MST of the original technology network in Figure 2. The MST shows the backbone of the technology space. B) Planar Maximally Filtered Graph (PMFG) contains all the strongest links, which can be kept under the constraint of being representable on a plane without any link crossing (planar graph). Tumminello et al. (2005) first introduced PMFG to find the most representative filtered sub-graph of the correlation network of stocks in equity markets, and showed that PMFG always contains the MST as a sub-graph and also preserves the hierarchical organization of MST, while containing a larger amount of links. In the context of our paper, the PMFG of the original technology network includes the strongest 357 links of which the 120 links in MST are a sub-set. C) Hidalgo et al. (2007) added additional strongest links (in the order of decreasing weights) to the MST, till the point that the number of links is twice of the number of nodes for best visualization, in their search for a good visualization of the product space. In the context of 17
the present paper, we add another 122 strongest links additional to the 120 MST links to result in a network with 242 links and 121 nodes. D) Klavans and Boyack (2006a) added links back to the network from zero, in the order of decreasing weights, until all unique nodes are connected into the network, in visualizing the networks of academic papers. In the context of the present paper, the network resulted from their approach includes the 2,273 strongest links. Their method is equivalent to removing the weakest links in the order of increasing weights, till the point that the removal of one additional stronger link would cause the network to become disconnected.
Table 1 reports the predictive powers of the filtered networks resulting from these four alternative filtering methods reviewed above. Only PMFG is below the trade-off curve and clearly not an optimal choice, because many points on the trade-off curve have both better predictive power and fewer links than the PMFG. For example, the method of Hidalgo et al. (2007) leads to filtered networks that contain fewer links than PMFG, but achieve a higher predictive power. PMFG’s requirement of avoiding link crossing on a plane leads to the inclusion of relatively weaker links, in contrast to Hidalgo et al. (2007) adding links in descending order of weights to MST. MST is the start point of the trade-off curve, but its predictive power is the lowest as a result of eliminating too many useful network links. The network filtered using the method of Klavans and Boyack (2006a) has the highest predictive powers on both inventor and organization diversification, as it contains many links than MST, PMFG and Hidalgo et al. (2007). One can continue to remove many links at a slight loss of predictive power. For example, it contains 20% more links to gain a 2% better predictive power than our network specifically filtered for predicting inventor diversification, and contains 17% more links to gain a 4% better predictive power than our network specifically filtered for predicting organization diversification. In sum, this network more than doubles the number of links of our two filtered networks to derive only slightly higher predictive powers. These two filtered networks shown in Figures 7 and 8 lie between Klavans and Boyack (2006a) and 18
Hidalgo et al. (2007) in respective tradeoff curves in Figure 6. Table 1. Comparison of Predictive Powers of Different Filtered Networks Networks (all of which have 121 connected nodes) Original network Filtered network for predicting inventor diversification Filtered network for predicting organization diversification A) Maximum Spanning Tree (MST) B) Planar Maximally Filtered Graph (PMFG) C) Minimum set of strongest links that contain MST and are 2 x number of nodes (Hidalgo et al., 2007) D) Minimum set of strongest links to connect all nodes (Klavans and Boyack, 2006a)
Number of links after filtering 7,195 817
Power to predict inventor diversification 0.93 0.90
Power to predict organization diversification 0.65 N/A
1,083
N/A
0.60
120 357
0.66 0.77
0.34 0.43
242
0.81
0.46
2,273
0.92
0.64
In particular, the curves in Figure 6 suggest no single best threshold but a segment of feasible threshold values to choose from for filtering. Any point in the feasible segment, which has sufficient extent of filtering (i.e., keeping fewer than a threshold amount of links) and high enough predictive power (i.e., higher than a minimum requirement of predictive power), can be considered an acceptable threshold for filtering. For example, if one can accept the organization diversification prediction power 0.46, the network of 242 links resulting from the method of Hidalgo et al. (2007) will be acceptable. If one can accept the inclusion of 1,083 links after filtering, the filtered network in Figure 8 will acceptable and significantly improves predictive power to 0.6, for organization diversification. In general, with the tradeoff curves, one can more objectively choose a tradeoff between predictive power and the extent of removal of weak links. Our filtering strategy is fundamentally different from those from the literature in that it aims to maintain the effectiveness of using the network to predict technology diversification (i.e. patent portfolio diversification) of innovation agents. It is supported by the new metric of diversification prediction power of a patent network, and the “link removal—predictive power” trade-off curves (shown in Figure 6). In the following, we demonstrate the use of these two filtered networks (Figures 7 and 8) to visualize and analyze the main diversification paths of inventors and technology firms, using an overlay map method.
19
6. Overlaying Filtered Networks with Main Paths of Technology Diversification Innovation studies have suggested that new technological knowledge and capabilities of innovation agents, e.g. inventors, firms or regions, are generally learned or built up incrementally via an evolutionary path, in which their future knowledge positions are shaped by the past ones (Winter, 2000; Nelson and Winter, 2009; Luo et al., 2014). Learning theories (Winston, 1992) further suggested that it will be easier for individuals and organizations to learn and enter new fields that are proximate to the ones that have already understood, because the required knowledge and capabilities for innovation in the current and new fields are highly related or similar to each other (Breschi et al., 2003; Wuyts et al., 2005; Nootebooma et al., 2007). Therefore, technology diversification and relevant capability building of an innovation agent can be viewed as its gradually navigating or searching from prior fields to future ones throughout the technology space. Herein we visualize and analyze the trajectories of such navigation and search for innovation of specific innovation agents, using network maps representing the total technology space. Specifically, we adopt the network filtered for predicting inventor diversification (Figure 7) and the one filtered for predicting organization diversification (Figure 8) as the respective background map, and overlay the map with the paths of search for innovation and diversification of a notable individual inventor, Leonard Forbes (see Figure 10), and a notable technology company, Google (see Figure 11). On the background map, we visualize and highlight not only the patent technology classes where the inventor or the company had patents over the years, but also the main diversification paths from the technology classes that the innovation agent entered earlier to the ones that it entered later. In the overlay technology maps (e.g. Figures 10 and 11), the highlighted nodes in blue color represent the technology fields that an innovation agent had been granted patents during 19762010, and are also labeled with their IPC class titles and the earliest years in which the agent had been granted a patent in the respective patent class. The intensity of the blue color of a 20
highlighted node corresponds to the number of patents of the focal innovation agent in the respective IPC class. An arrowed link in purple color highlights the strongest (in terms of relatedness value) link to a technology field of an innovation agent from any of the technology fields that the innovation agent entered previously. The strongest links are most likely for the innovation agents to traverse to enter newer fields. To find the start field of the strongest link to a newly entered class, we only consider those previously entered fields in which the innovation agent is still active, e.g. having patents in the most recent 5 years prior to entering the newer target field. If an innovation agent has not had patent in a field for many years since the agent entered field, it indicates that the agent has left that field and no longer has innovation capability there. Figure 9 presents an example to illustrate the procedure to identify such strongest links. For example, assuming that an innovation agent entered four fields, A, B, C and D in 2000, 2002, 2004 and 2006 respectively. To identify the strongest path for the agent to diversify into field D in 2006 from its prior knowledge positions, we first identify the fields where the agents had patents in the 5 years prior to the first patent in 2006. The agent had no patent in field A in this recent 5-year time period (i.e., from 2002 to 2006), indicating that the agent no longer maintains innovative capability in field A and the diversification path into field D in 2006 should not start from field A. Alternatively, one can use a shorter or longer period than 5 years to determine the most recent capability positions of an innovation agent before it enters new positions. After A is ruled out, we compare the relatedness values of the link between B and D and the link between C and D. Link B-D is stronger than link C-D, so we consider B->D the strongest path of diversification into D and highlight the arrow from B to D on the background map. All highlighted arrows (i.e. strongest links) together constitute the main paths of patent portfolio diversification of an innovation agent over time.
21
Figure 9. The procedure to identify the strongest link entering a target field from existing fields of an agent. A, B, C, D represent 4 different technology fields that the agent entered in year 2000, 2002, 2004 and 2006 respectively. Numbers next to the links denote the knoweldge relatedness between the pairs of technological fields.
Leonard Forbes Leonard Forbes is a retired professor at Oregon State University, and one of the most prolific inventors of all times. His primary inventions are related to semiconductor memories, thin film processes, and other electronic devices. In our total US patent dataset from 1976 to 2010, we find he has 927 patents classified in 16 IPC classes. Table 2 lists his 16 IPC patent classes in the order of the year when he entered each class (i.e. entering means being granted the first patent in a class), and the number of his patents in each class, as of 2010. The distribution of patents across IPC classes shows that his largest field of patenting, i.e. electronic elements (H01), is also the field where he began his innovation trajectory in 1991. Table 2: IPC Classes in Which Leonard Forbes Had Patents in the Period from 1976 to 2010 #
Class ID
1 2 3 4 5 6 7 8 9 10 11 12 13
H01 G11 A01 B05 G03 G05 H03 H04 H05 C23 G06 G02 G01
IPC Class Title* Electric Elements Information Storage Agriculture Spraying & Atomizing Photography, Electrograph & Holography Controlling & Regulating Electronic Circuitry Electric Communication Electric Techniques Coating Metallic Material Computing Optics Measuring & Testing
22
Year of Entrance 1991 1997 2000 2000 2000 2000 2000 2001 2001 2002 2002 2003 2004
Number of Patents of Forbes in the Class 659 158 1 4 3 4 65 4 2 7 4 7 3
14 H02 Electric Power 2004 15 C30 Crystal Growth 2006 16 B32 Layered Products 2009 Note: the titles here are abbreviated. Corresponding full titles can be found in the appendix.
3 2 1
Figure 10 highlights the 16 technology fields where Forbes had patents, the earliest year when he entered each field, and the 5 main paths of patent portfolio diversification over years. The main paths were comprised of the strongest links to the later-entered fields, from any of his previously entered fields. Interestingly, it turns out that these links on the main paths are also the strongest ones among all links of the starting fields, as indicated by the high p values in Table 3. The p value is the percentile of the starting field’s all links, which have lower or equal relatedness values than the arrowed link from the starting field. For example, the p value of 0.9333 for the diversification link from electric elements (H01, 1991) to information storage (G11, 1997) means that the relatedness value of this link is greater than or equal to 93.33% of all the non-zero links of class H01 electric elements. p values for all diversification links in the main paths are reported on the map in Figure 10. The overall high values of p for all the outgoing links of the starting fields indicate that the inventor preferred to diversify into new fields that are highly related to the ones that he has previously entered. It also indicates that the main paths identified here well contain the most likely diversification links of the inventor over time.
23
Figure 10. Main paths of diversification of Leonard Forbes. Nodes are IPC classes; node sizes correspond to the number of patents in the represented patent technology classes. Forbes had patents in the highlighted blue nodes; node color intensity denotes the number of patents of Forbes in respective classes.
Table 3. Relatedness of the links in the main paths relative to all links of the starting fields of Leonard Forbes.
Path 1
2 3 4 5
Diversification Link Electric elements (H01, 1991) Information storage (G11, 1997) Information storage (G11, 1997) Electronic circuitry (H03, 2000) Electronic circuitry (H03, 2000) Electric communication (H04, 2001) Electric communication (H04, 2001) Computing (G06, 2002) Computing (G06, 2002) Measuring & testing (G01, 2004) Electric elements (H01, 1991) Controlling & regulating (G05, 2000) Controlling & regulating (G05, 2000) Electric power (H02, 2004) Electric elements (H01, 1991) Photography, Electrograph & Holography (G03, 2000) Photography, Electrograph & Holography (G03, 2000) Optics (G02, 2003) Electric elements (H01, 1991) Electric Techniques (H05, 2001) Electric elements (H01, 1991) Spraying or Atomizing (B05, 2000) Spraying or Atomizing (B05, 2000) Coating Metallic Material (C23, 2002) Coating Metallic Material (C23, 2002) Layered Products (B32, 2009) Coating Metallic Material (C23, 2002) Crystal Growth (C30, 2006)
p-value: Percentile of Starting Field’s Links