Patent landscaping for life sciences innovation - University of Alberta

16 downloads 61 Views 305KB Size Report
potential competitors, distribution channels and partners. ... represent network connections or the density of clusters ... landscapes to track trends and support the.
p at e n t s

Patent landscaping for life sciences innovation: toward consistent and transparent practices Tania Bubela, E Richard Gold, Gregory D Graff, Daniel R Cahoy, Dianne Nicol & David Castle

npg

© 2013 Nature America, Inc. All rights reserved.

As industry, governments and academia increasingly rely on patent landscapes to map scientific and technological trends, an interdisciplinary workshop provides recommendations for developing consistent and transparent landscaping practices.

M

ore than ever before, industry, governments and academia rely on ‘landscapes’ to map scientific and technological trends within specific fields of science and technology. A landscape is an analysis of the relationships between multiple sets of indicators or of those indicators measured against temporal, technical or spatial dimensions. Indicators might include scientific articles, patents, clinical or field trials, regulatory approvals, and actors or institutions. Additional analyses can represent network connections or the density of clusters of scientific or technological fields. As countries within the Organization for Economic Co-operation and Development strive to integrate science policy and innovation strategy1, they place greater reliance on landscapes to track trends and support the coordination of activities, actors and institutions2. Despite the growing prevalence and importance of landscapes, including some published in high-impact scientific and policy journals, serious inconsistencies Tania Bubela is at the School of Public Health, University of Alberta, Edmonton, Alberta, Canada; E. Richard Gold is in the Faculty of Law, McGill University, Montreal, Quebec, Canada; Gregory D. Graff is in the Department of Agricultural and Resource Economics, Colorado State University, Fort Collins, Colorado, USA; Daniel R. Cahoy is at Smeal College of Business, The Pennsylvania State University, University Park, Pennsylvania, USA; Dianne Nicol is in the Faculty of Law, University of Tasmania, Hobart, Tasmania, Australia; and David Castle is at Innogen Institute, University of Edinburgh, Edinburgh, UK. e-mail: [email protected]

202

persist in landscaping techniques3. These create obstacles in assessing the informational value of individual landscapes and in comparing, combining and extending multiple landscapes. A second challenge arises from the lack of transparency about the data and techniques used. In response, three of the authors (Bubela, Castle, Gold) organized a workshop with representatives from industry, academia, public funding agencies, patent offices and other government agencies. Workshop participants discussed landscaping practices and challenges for the representative field of synthetic biology. Based on workshop presentations and discussion, the authors developed recommendations to improve landscaping methodologies and to make their use more consistent and transparent. Workshop participants reviewed the recommendations. This article focuses on patents as the most commonly used indicator in landscapes4. Patent landscapes vary in scale and scope, ranging from specific in-depth analyses of a narrow range of patents to large-scale landscaping of entire technological fields. Landscapes of any scale seek to encompass, as much as possible, an entire population of relevant data, rather than a random sample drawn from that population. In science and technology, as well as in legal disputes, single seminal events can be crucial to understanding an entire field, but can easily be missed in a random sample. The resulting data can be visualized graphically or can comprise counts of indicators across selected dimensions. The data capture portions of the applied and translational research environment, identifying areas of research and development (R&D) considered to be of commercial value and providing an understanding of the potential

influence of intellectual property (IP) rights on innovation. Although freedom-to-operate opinions fall within the formal definition of a landscape, they follow a set of norms unique to the practice of law and focus on the analysis of potential liabilities. Also closely related are corporate, regional or national assessments of R&D activities; however, to the extent that they cut across multiple fields of technology found within the individual firm or geographic region, while excluding any context beyond its boundaries, they fail to fit the definition of a landscape. We therefore do not discuss them further. Tailoring analytical strategies The most effective patent landscaping protocols align scope and methodology with the purpose of the target audience and the specific issues to be addressed (Table 1). Various audiences, particularly policy makers and academic consumers, share an interest in some of the same issues; other communitites have narrowly defined, unique interests. Similarly, practitioners of patent landscaping vary from dedicated corporate actors conducting analyses for internal or contract purposes, to researchers interested in academic or policy questions, to government agencies, such as policy branches of national IP offices and international organizations. For industry, patent landscapes can enable a high-level understanding of a sector for strategic planning, especially across jurisdictions5. An analysis of other actors in the same technology space can assist a company in evaluating its initial idea in relation to potential competitors, distribution channels and partners. An analysis of the way claims are drafted (e.g., narrowly or broadly) for similar technologies in different jurisdictions

volume 31 number 3 MARCH 2013 nature biotechnology

pat e n t s

Table 1 Landscaping issues and analysis Issues that can be addressed through landscaping What aspects of particular technologies, products or fields do patents cover?

Type of landscape/analysis

Common metrics

Technology landscape

Classification/claim type/ technology keywords X • Issued patents

Comparative technology landscape Prior art search Licensing opportunity analysis

• Patent applications • Country of origin • Assignee/inventor

How do patent rights affect certain firms or institutions?

npg

© 2013 Nature America, Inc. All rights reserved.

How do patent rights map to geographic regions or countries?

Which patents are the most important or valuable?

Institutional portfolio analysis

Assignee(s)/inventor(s) X

Inventor portfolio analysis

• Issued patents

Performance review

• Patent applications

Competitor analysis

• Country of origin

Industry analysis

• Classification/claim type

Regional innovation indicators

Geography/region X

Innovation cluster analysis Foreign filing analysis

• Classification/claim type

International patent family landscape

• Patent applications

Country/region of inventor and movement over time

• Country of origin

Claim construction

• Number/scope of claims

Bibliometric analyses

• Forward citation

Litigation analysis

• Patent families

• Issued patents

• Assignee/inventor

• Litigation • Maintenance fees How do patents relate to one another?

Bibliometric analyses

• Forward/backward citations

Network citation analyses

• Keywords

Semantic similarity analysis

• Co-inventorship • Assignee/inventor links • Network statistics

How might patents affect innovation or competition policy?

Patent counts

• Scope of claims

Patent claims analysis

• Issued patents

Patent density

• Classification/claim type

Statistical modeling

• Patent applications • Country of origin • Assignee/inventor

can assist in developing patent drafting strategies. Referred to as claims construction, such analysis can also assist in identifying prior art that ought to be referenced or distinguished to meet patent criteria of novelty and inventiveness. Monitoring the field is essential for longer-term asset management and the development of market strategies as well as firm valuation and portfolio analysis for investment, and mergers and acquisitions6. Some IP offices engage in larger-scale patent landscaping to assist national or regional innovative firms and government agencies that set science and technology policies or funding strategies. For example, the United Kingdom Intellectual Property Office (UKIPO) has a policy branch that produces landscapes and analyses of specific technology domains of strategic interest for economic development in the UK, such as stem cell research7. Similarly, the policy unit of the Japanese Patent Office (JPO) tracks technology of national interest to provide advice and information to Japanese industry

and policy makers. For example, the JPO has been tracking the rise of patenting by Chinese applicants in Japan, especially in the field of nanotech. The European Patent Office (EPO) engaged in a joint project with the United Nations Environment Program and the International Centre for Trade and Sustainable Development (ICTSD) to landscape patents on clean energy technologies8. Patent offices are uniquely positioned to provide landscapes to guide national industrial policy and firm strategies. They have access to internal databases—including correspondence between examiners and applicants— and to highly qualified technical staff with not only scientific knowledge, but also an understanding of how to interpret patent claims, of emerging terminology and of the patenting process itself. Broader analyses comparing technology fields over time and between jurisdictions can inform national investments in R&D. Funding can be directed toward identified regional strengths, innovative sectors

nature biotechnology volume 31 number 3 MARCH 2013

or areas of growth. Conversely, investment can be spurred in underfunded or emergent fields in response to a growth in R&D in competitor markets. Tracking inventor affiliations and applicant firm histories can enable policy makers to analyze the flow of human resources within and between jurisdictions. For example, China is attracting researchers and promoting studies abroad for its citizens to advance its high-technology research agenda in fields such as nanotech9. Further, assessing the level of innovation in specific fields can assist regulators and technology assessment agencies to proactively develop policies and procedures for emerging technologies. The forecasts enabled by landscapes are invaluable because it takes time to adapt existing regulatory frameworks for health, food and drugs, environmental and laboratory practices to new technologies. In addition to defining the broad contours of a technology, patent landscaping can be used to address specific questions of policy and academic relevance. These include assessing the impact of permissive or restrictive research and IP policies in specific technology sectors and identifying emerging patent thickets, blocking patents or potential antitrust issues. With this knowledge, policy makers will be in a better position to intervene—through the exercise of government use rights, waiver of sovereign immunity, compulsory licensing or the development of guidelines for the issuance of patents—or refrain from intervening, as appropriate. Patent landscaping methodologies When engaging in patent landscaping, practitioners generally begin by defining the purpose of the proposed landscape and its potential scope combined with a realistic assessment of time and cost constraints4. Data collection and processing phases are especially time consuming, and the need to access proprietary databases can increase costs. Maximizing data capture at the first stage increases the likelihood that retrieved data can be reanalyzed if the focus changes as different analyses based on different sets of metrics can be applied to collected data sets, depending on the issue to be addressed. The most common strategy is to define a field of interest and develop a search strategy that captures as complete a set of patent documents as possible. An alternative strategy is to define foundational or enabling technology patents and then build a larger set of follow-on patents through linkages between patent documents, such as by following citations. This second strategy can be useful for emerging 203

pat e n t s

Search. Once the broad issues and the overall strategy have been determined, the next step is to design the specific search strategy. This necessitates an identification of a suitable patent database from among both publicly available and proprietary options, many of which are listed in a resource provided by the Patent Information Users Group (http://piug.wildapricot.org/vendors), followed by the development of databasespecific search algorithms combined with limits to the scope of the search (for example, date, legal status or patent class/code)10. At this stage, it is advantageous to consult with a specialist in information science who is specifically trained in the development of search algorithms. Although free, public databases have limited capabilities for complex search algorithms, provide only basic informational fields within specific countries or regions, and generally provide limited higherorder analytic tools11. PatentLens, from the Initiative for Open Innovation (IOI), is

an exception to the latter, enabling searching in multiple languages and tools specific for searching protein and DNA sequences worldwide (http://www.patentlens.net). The World Intellectual Property Organization’s (WIPO) Patent Scope is another exception, as it provides basic visual representations of patent application data such as inventor or country of origin (http://patent scope.wipo.int/search/en/search.jsf ). Private databases, most notably Thomson Innovation and Elsevier’s Scopus Database, enable complex search algorithms (Boolean and/or natural language) across multiple jurisdictions, provide hand-curated information in addition to standard information based on public data, and higher-order analytics, such as corporate histories and sophisticated visualization tools. The database-specific search algorithm can combine sets of keywords comprising synonyms for specific technologies or broad fields. Keywords are often combined with patent class codes that group patents into technological categories. Patent class codes include the US Class (USC), International Patent Class (IPC), European Patent Class (EPC) and Derwent Manual Codes. The

Assess time & cost constraints

Define purpose and scope of landscape

Consult with information & technical specialists

npg

© 2013 Nature America, Inc. All rights reserved.

but ill-defined fields such as nanotech and synthetic biology. Figure 1 illustrates the iterative process of patent landscaping.

Design search strategy

• Identify database(s) • Develop search algorithm(s)

Data cleaning & curation

• Merge data sets • Define degree of error tolerated • Remove irrelevant documents

Expert validation

Descriptive statistics

Visualization of trends

Figure 1 The iterative process of patent landscaping.

204

Augment data set with additional fields or manual coding

Higher-order analytics

• Importance/value • Relationships between documents • Impact of document on innovation

new Cooperative Patent Classification has recently been made available as a joint initiative to speed the patent granting process between the US Patent and Trademark Office and the EPO, and has replaced the US and European classes as of January 1, 2013. Other limits on searches can be organized by geography, assignees, inventors or other definitional fields. One example is the Ade/Cook-Deegan algorithm used by the DNA Patent Database housed at Georgetown University in Washington, DC, to identify all DNA and RNA patents in the Delphion database (http://dnapatents. georgetown.edu/SearchAlgorithmDelphion-20030512.htm). Another is the algorithm developed by Bergman and Graff to capture all stem cell–related patents in the Thomson Innovation database12. Such algorithms are developed in consultation with technical experts. Expert validation ensures that important branches or key technologies within a field are not omitted, and conversely, refines the algorithm to reduce the percentage of irrelevant documents. The steps of algorithm development, and the next steps of data collection, data cleaning, data curation and exploratory analyses are necessarily iterative and reliant on expert input to arrive at a final data set capable of addressing strategic, policy or academic questions. The breadth or specificity of such questions is dependent on scale and quality. Cleaning and curating. The next step, after the search for and collection of patent documents, is to clean and curate data, a time-­consuming exercise, especially if data have been collected from multiple public databases. Cleaning involves reviewing the documents in the data and discarding any that are outside of the intended scope of the analysis. Cleaning can be automated using criteria built into the search function, or technical experts can do it manually if the data set is not impracticably large. Data curation includes the merging of data from different databases into consistently formatted and structured data sets, possibly followed by manual or automated mechanisms for periodic future data updates. International analysis of patent data necessarily introduces the complication of single inventions being represented as ‘families’ of filings at multiple patent offices. Curation of a landscape involving more than one jurisdiction must manage the data at the level of patent family and the level of individual patent documents8. The degree of error that can be tolerated, both in terms of the relevance of the document set and the structure of individual

volume 31 number 3 MARCH 2013 nature biotechnology

pat e n t s

npg

© 2013 Nature America, Inc. All rights reserved.

fields, depends on the scale of the analysis, partly based on considerations of time and cost4. Nevertheless, landscapers must assess the nature (for example, a tendency toward over- or under-inclusion) and magnitude of errors, possibly through a detailed analysis of a subset of the data set. For broad landscapes directed toward identifying, as completely as possible, an entire population of documents, a metric for assessing data saturation should be defined and reported. The error types and rates, as well as underlying assumptions, must be reported so users may assess the reliability and validity of a given landscape. Analysis. Analysis generally commences with standard exploratory or summary statistics per data field and simple analyses such as how these fields vary temporally or by patent jurisdiction. Landscapes, by definition, are multidimensional, analyzing document attributes across combinations of time, geography and technical field. Document attributes equate to data fields such as inventors, assignees or applicants, legal status, or technology classification and/or code. Such simple analytics, although nominally informative, also provide validation of the prior steps and can point to the need for further refinements in the data. The data set can be augmented with additional data by, for example, categorizing patent assignees or inventors as public or private sector actors; coding patent claims for subject matter, scope or validity; or doing automated semantic analyses of keywords descriptive of specific fields or types of claims, such as distinguishing between products and processes. For questions regarding the technology actually protected, manual coding of the claims of the patent documents within the landscape is often essential for meaningful conclusions; for example, coding whether patents for genetic diagnostic tests are blocking or can be circumvented by public laboratories13. Automated lexicographic analyses of claims provide only limited inferences about the nature of the research activity or the scope of control because they ignore the subtleties imposed by legislative texts and court decisions. Experts such as patent attorneys, agents and examiners are well versed in these subtleties and are best placed to conduct such analysis. Large data sets cannot realistically be coded by hand. Here, analysts must rely on coding frames to analyze claims language. In these cases, it is imperative that the methods, the background of the coders, their coding training, measures of intercoder reliability, and the frame itself are adequately explained and contextualized.

Qualitative characteristics of patents, such as their scientific significance or private economic value, can be determined by examining the number of claims, the number of jurisdictions covered by the patent family and the number of times the patent is cited14,15. Studies have shown that the number of citations made to a patent is related to the private economic value of that patent16,17. The technical diversity of citing patents, as indicated by the range of technology classes they cover, can be related to the scientific “generality” or “basicness” of the subject patent18. The degree to which a patent has been litigated is also correlated with patent value, and patents can be linked to litigation databases19. The final, augmented data set can then be analyzed using sophisticated visualization and graphing techniques or statistical modeling techniques, depending on the issues of interest20. There are many available visualization and analytic tools with varying utility11. Geographic Information System (GIS) software can represent nodes (for example, individuals or cities) and the strength of the linkages between them on a geospatial map. Software that visualizes collaborative networks of inventors or applicants can also display node attributes, the strength of the linkages between them and calculations of network statistics that indicate how central or important an actor is within the network. Thematic maps cluster documents according to similarity of keywords, showing peaks and valleys of activity within specific fields. Lexicographic analyses further allow the tracking of text between documents so that keywords, phrases or concepts can be traced through different forms of documents originating with different sets of actors, for example, linking scientific publications with patents. Finally, bibliometric analyses of citation patterns (for example, backward and forward citations and co-citations to other patents or scientific literature) can help to analyze the intellectual structure of a discipline to track citations through generations of publications in a field, or to identify and track seminal or foundational documents20,21. Recommendations for practitioners Landscapes can be powerful tools for developing policy and business strategy. Most, however, are relatively simple, relying on count data and possibly thematic maps4. Many are opaque in the description of their methods, risking interpretations that go beyond the inherent limitations of the landscape in question. One example is the much-quoted statistic that 20% of human genes are patented and likely to impede

nature biotechnology volume 31 number 3 MARCH 2013

next-generation genetic technologies. This is based on a novel automated landscaping method developed by Jensen and Murray, which mapped sequences named in patents (as SEQ IDs) to the human genome22. Further examination of a subset of the patents by the most accurate method of claim construction—reading by experts—found that far fewer of these alleged human gene patents actually recite human DNA molecules, methods for genetic testing or genetic sequencing in the claims23,24. Although this does not undermine the value of the methodology developed by Jensen and Murray, it points to the misunderstanding and possible misuse by others of landscaping studies, particularly where methods are unclear or misinterpreted. Landscaping practitioners must be careful to select methodologies capable of answering questions of interest at the appropriate scale. An iterative process with relevant expert consultation is essential. The limitations of the databases, search strategies and analytical methods must be made explicit. Reliance on proprietary landscapes can accordingly be problematic because search algorithms and landscaping methods are rarely published. Transparency would be improved if, when proprietary databases are used, patent numbers are included in an appendix or Supplementary Information. This is simply good practice for all scientific and social science research; methods should be readily replicable to assess the validity of conclusions. Journals can act as gatekeepers to ensure methodological best practices for social science research. To improve landscaping activities, governments, research institutions and private foundations should support the development of publicly accessible databases and associated analytical tools3. This holds true for patent databases and for publications databases more generally. The best overview of an innovation landscape can be accomplished by situating the patent landscape within other information of R&D and corporate activities, for example, by cross-referencing to publications and mergers-and-acquisitions databases. Patent and other landscaping techniques are critical factors in developing science and technology policy and business strategy. As with many other emerging techniques, early adopters have experimented with an assortment of methods to provide valuable insight into technology spaces. Landscaping techniques are now sufficiently mature and used widely enough in support of science and innovation policies that general 205

pat e n t s

ACKNOWLEDGMENTS We would like to thank the participants from industry, government agencies and academia at the Managing Knowledge in Synthetic Biology: The Creation of Tools for Stronger Intellectual Property Analysis workshop hosted by the Innogen Institute of the University of Edinburgh in 2012. The workshop was supported by VALGEN (Value Addition through Genomics) and GE3LS (Genomics and its Related Ethical, Environmental, Economic, Legal and Social Aspects), a project sponsored by the Government of Canada through Genome Canada, Genome Prairie and Genome Quebec, and the PhytoMetaSyn project funded by Genome Canada, Genome Alberta and Genome Quebec. We thank A. Baker, D. Lewensohn, M. Bieber and L. Dacks for administrative and research support.

COMPETING FINANCIAL INTERESTS The authors declare no competing financial interests. 1. OECD. The OECD Innovation Strategy: Getting a Head Start on Tomorrow (OECD, Paris, 2010). 2. OECD. Collaborative Mechanisms for Intellectual Property Management (OECD, Paris, 2011). 3. Gold, E.R. & Baker, A.J. L. Inf. Sci. 22, 76–97 (2012). 4. Managing Knowledge in Synthetic Biology: The Creation of Tools for Stronger Intellectual Property Analysis, Edinburgh, UK, June 20–21, 2012. 5. Lee, S., Yoon, B., Lee, C. & Park, J. Technol. Forecast. Soc. Change 76, 769–786 (2009). 6. Breitzman, A.F. & Mogee, M.E. J. Inf. Sci. 28, 187– 205 (2002). 7. United Kingdom Intellectual Property Office. Regenerative Medicine: The Patent Landscape in 2011 (Intellectual Property Office, Newport, 2011). 8. United Nations Environment Program. Final report: patents and clean energy: bridging the gap between evidence and policy, (UNEP, EPO, ICTSD, Munich, 2010). 9. Lenoir, T. & Herron, P. J. Biomed. Discov. Collab. 4, 8 (2009).

10. Bonino, D., Ciarmella, A. & Corno, F. World Pat. Inf. 32, 30–38 (2010). 11. Yang, Y.Y., Akers, L., Klose, T. & Yang, C.B. World Pat. Inf. 30, 280–293 (2008). 12. Bergman, K. & Graff, G.D. Nat. Biotechnol. 25, 419–424 (2007). 13. Huys, I., Berthels, N., Matthijs, G. & Van Overwalle, G. Nat. Biotechnol. 27, 903–909 (2009). 14. OECD. OECD Patent Statistics Manual (OECD, Paris, 2009). 15. Lanjouw, J.O. & Schankerman, M. Econ. J. 114, 441–465 (2004). 16. Trajtenberg, M. Rand J. Econ. 21, 172–187 (1990). 17. Hall, B.H., Jaffe, A. & Trajtenberg, M. Rand J. Econ. 36, 16–38 (2005). 18. Hall, B.H., Jaffe, A.B. & Trajtenberg, M. NBER Working Paper 8498 (2001). 19. Allison, J.R., Lemley, M.A. & Walker, J. Georgetown Law J. 99, 677–712 (2011). 20. Small, H., Sweeney, E. & Greenlee, E. Scientometrica 8, 321–340 (1985). 21. Bubela, T., Strotmann, A., Noble, R. & Morrison, S. Cell Stem Cell 7, 25–30 (2010). 22. Jensen, K. & Murray, F. Science 310, 239–240 (2005). 23. Holman, C.M. UMKC Law Rev. 80, 563–605 (2012). 24. Holman, C.M. Nat. Biotechnol. 30, 240–244 (2012).

npg

© 2013 Nature America, Inc. All rights reserved.

methodologies, reporting criteria and consistent practices ought to be established. These will enhance the reliability of landscapes upon which business, government and academic leaders can base science and technology decisions.

206

volume 31 number 3 MARCH 2013 nature biotechnology