Learning from Science and Technology Policy Evaluation

10 downloads 569 Views 313KB Size Report
The contributors – leading experts in science and technology policy and evaluation – analyze and contrast ..... careers and organizational reform ('institutional learning'); ..... tradition of evaluating individual research institutes of the Blue List.
Learning from Science and Technology Policy Evaluation Experiences from the United States and Europe Edited by Philip Shapira, Professor of Public Policy, Georgia Institute of Technology, US and Stefan Kuhlmann, Director, Department of Technology Analysis and Innovation Strategies, Fraunhofer Institute for Systems and Innovation Research, Karlsruhe, Germany and Professor of Innovation Policy Analysis, Utrecht University,The Netherlands Learning from Science and Technology Policy Evaluation presents US and European experiences and insights on the evaluation of policies and programs to foster research, innovation, and technology (RIT). In recent years, policymakers have promoted RIT policies to accelerate scientific and technological development in emerging fields, encourage new patterns of research collaboration and commercialization, and enhance national and regional economic competitiveness. At the same time, budgetary pressures and new public management approaches have strengthened demands for RIT performance measurement and evaluation. The contributors – leading experts in science and technology policy and evaluation – analyze and contrast the need and demand for RIT performance measurement and evaluation within the US and European innovation and policy making systems.They assess current US and European RIT evaluation practices and methods in key areas, discuss applications of new evaluative approaches and consider strategies that could lead to improvements in RIT evaluation design and policies. Contributors: L. Bach, P. Boekholt, B. Bozeman, D.F.J. Campbell, E. Corley, S.E. Cozzens, J.S. Dietz, I. Feller, M. Gaughan, L. Georghiou, D.H. Guston, K. Guy, G.B. Jordan, S. Kuhlmann, M. Lackey, M.-J. Ledoux,T. Luukkonen, M. Matt, A. Rip, P. Shapira, D. Streit, L.G.Tornatzky, N.Vonortas

2003 416 pp Hardback 1 84064 875 9 £75.00

ORDERS TO: Marston Book Services Limited, PO Box 269, Abingdon, Oxon OX14 4YN UK Tel: + 44 1235 465500 Fax: + 44 1235 465555 Email: [email protected] Web: www.marston.co.uk TO REQUEST A CATALOGUE, PLEASE CONTACT: The Publicity and Marketing Department, Edward Elgar Publishing Limited Glensanda House, Montpellier Parade, Cheltenham, Glos GL50 1UA UK Tel: + 44 1242 226934 Fax: + 44 1242 262111 Email: [email protected] Sales: [email protected] Web: www.e-elgar.com

7.

The evaluation of university research in the United Kingdom and the Netherlands, Germany and Austria David F.J. Campbell

INTRODUCTION The analysis focuses on university research evaluation in Europe by comparing particularly policies in these four countries: United Kingdom, the Netherlands, Germany, and Austria. After a review of current funding trends of university research, conceptual and methodic considerations for university research evaluation will be discussed: peer review and/or indicators are described as the two dominant approaches of assessing and measuring university research. With regard to comprehensive institutional ex-post research evaluation a comparative empirical typology is proposed for discussion, juxtaposing ‘Type A’ countries (United Kingdom and the Netherlands) and ‘Type B’ countries (Germany and Austria). Within that classification Type A implies that comprehensive ex-post research evaluations, covering and addressing all disciplines at national level, already are implemented. In detail we then compare and comment on the methods of ex-post research evaluation in the United Kingdom and the Netherlands. Afterwards, an overview of the contemporary situation of evaluation of university (and university-related) research in Germany and Austria is presented. The conclusions propose different policy scenarios for the future, for how Type A and Type B countries might relate with regard to a possible cross-country spreading of comprehensive ex-post research evaluations. Briefly also the notion is mentioned, whether one can speak of a ‘coevolution’ of university research and university research evaluation.

98

The evaluation of university research

99

FUNDING TRENDS OF UNIVERSITY RESEARCH AND ITS IMPLICATIONS FOR UNIVERSITY RESEARCH EVALUATION Contemporary advanced societies commonly are described as ‘knowledgebased’, implying that knowledge, knowledge production, know-how, and expertise should be regarded as important factors that determine to a large extent economic performance and economic competitiveness (Bell [1973] 1999; European Commission, 1997, pp. 7−10; Gibbons et al., 1994; Hicks and Katz, 1996; IMD, 1996, p. 12; Müller, 1999; Porter, 1990). Crucial in such an understanding is the conviction that research or R&D (research and experimental development) should be regarded as core processes that are responsible for knowledge and knowledge production. Research again can be differentiated into basic and applied research (OECD, 1994). Furthermore, within the context of national (or supranational) research and innovation systems again academic research plays a pivotal role. The term academic research addresses university as well as university-related research (Campbell and Felderer, 1997, pp. 2−3). Referring to standardized OECD terminology, university research coincides with R&D that is performed by the higher education sector; and university-related research coincides with R&D being performed by the government and private non-profit sectors (thus ‘university-related’ research can be regarded as a terminological equivalent to the German-language term of ‘außeruniversitäre Forschung’ – see BMBF, 1998a, p. 14; 1998b, p. 14). Business R&D consequently represents the R&D which is carried out by the economy (the business enterprise sector). Academic research clearly represents a sciences-based and sciences-induced activity, where a major emphasis is placed on basic research and on the combination of basic and applied research. ‘Sciences’, in its broadest meaning, covers all of the sciences, including the natural and life sciences, engineering, social sciences, and humanities. Since academic institutions and universities, in particular, also conduct tertiary teaching and education, academic research is closely associated with the development and build-up of highly qualified human capital (OECD, 1998a). One can argue that within the context of national research and innovation systems1 the specific function of university research is to perform basic research. Empirically, expressed in financial terms, it can be demonstrated that for the (aggregated) EU (European Union) as well as for the US and Japan the basic research of the universities outpaces the basic research, which is carried out by business (OECD, 2000a; 2000b). The main emphasis of the economy focuses on applied research and experimental development. Since the ‘life cycles’ of products and services decrease (European Commission, 1995, p. 17), the importance of basic research again is reinforced, since only basic research guarantees long-term innovation. At the same time ‘research time horizons’ decrease, demanding a faster market

100

Learning from science and technology policy evaluation

entry of R&D products and services (OECD, 1998b, pp. 179−81, 185−86). One solution to this challenge is to place an emphasis on the ‘paralleling’ of basic research, applied research and experimental development (Campbell, 2000, p. 139). Paralleling can be realized through organizational reform within academic (university and university-related) and business institutions. Another approach stresses the linking of university (basic) research more effectively to R&D in the university-related and business sectors. In summary, the evaluation of university research therefore implies that primarily ‘basic research’ is assessed. Focusing on current midterm R&D funding trends of the advanced OECD countries, the following observations can be made for the European Union (EU 15), the United States and Japan: When national (supranational) R&D expenditure is calculated in million constant US dollars – in prices and purchasing power parities (PPP) of 1995 – per a population of 100 000, then a real increase can be observed for the period 1985-1998, although expenditure stagnated in the first half of the 1990s (see Figure 7.1). Since national R&D is primarily funded and performed by business (OECD, 2000a, pp. 20−23), the OECD-wide decline of economic growth rates during the early 1990s pressured national R&D expenditure (OECD, 1999a, p. 50). However, towards the end of the 1990s the national R&D expenditure again substantially gained in momentum. If the same indicator (using the same mode of calculation) is applied only for the university R&D, then we also can state that university R&D expenditure expanded in the European Union, the United States and Japan during the period 1985−98 (see Figure 7.2). In the EU and the US the university R&D even realized higher growth rates than national R&D. The general R&D gap between the EU vis-à-vis the United States and Japan should be an issue of concern for European decision-makers. Since the university R&D gap is less pronounced than the national R&D gap, this can imply: First, university R&D is for the European research and innovation systems of a particular importance. Second, business R&D in Europe needs improvement. Considering those arguments in a different aspect, the continuous lead of US R&D expenditure should perhaps be regarded as one factor explaining the success of the American economy during the 1990s. Taking into account that the primary competence of university research focuses on basic research, this implies the following conclusions: (a) There convincingly operates an enduring viability of university (basic) research for advanced societies and economies. (b) Universities and university research continue to play an important role for the national (and supranational)

The evaluation of university research EU (EU 15)

Japan

US

1998

71.18

37.30

1997

72.74 67.14

35.61

1995

69.80 67.52

35.13

1994

66.37 63.62

34.85

1993

67.07 64.40

34.93

1992

69.23 65.93

35.23

1991

69.63 66.62

35.39

1990

80.21

76.64 69.82

36.14

1996

101

70.13 65.16

36.69

68.47

1989

60.43

35.74

1988

34.37

55.69

67.68 66.66

1987

33.14

1986

31.70

1985

30.53

0

20

40

52.12 65.90

49.00

64.83

48.51

60

80

100

Source: Author’s own calculations based on OECD (2000b) and Campbell (2000).

Figure 7.1 Gross domestic expenditure on R&D per population of 100 000 (1985−98). Currency unit: million constant $ in 1995 prices and purchasing power parities

Learning from science and technology policy evaluation

102

EU (EU 15)

Japan

1998

US

11.5

9.9

7.6

11.1

1997

9.

7.5

1996 1995

10.5

8.9

7.1

1993

10.6

9.7

7.3

1994

10.7

9.

7.4

10.

9.0

7.0

10.1

1992

8.4

6.9

1991 1990

5.9

1987

5.7

1986

5.2

10.6

7.5

6.2

1988

10.

7.9

6.5

1989

9.8

8.0

6.6

10.1

7.4

9.6

7. 8.9

6.8 8.

1985

6.8

5.0

0

2

4

6

8

10

12

14

Source: Author’s own calculations based on OECD (2000b).

Figure 7.2 Gross domestic expenditure on university R&D per population of 100 000 (1985−98). Currency unit: million constant $ in 1995 prices and purchasing power parities research and innovation systems; thus, the concept of the knowledge-based societies is empirically substantially supported, also by referring to the pivotal role of the sciences within such processes. Therefore, the evaluation of university (basic) research marks an area of strategic relevance.

The evaluation of university research

103

In most OECD countries, university research is primarily public funded. With regard to public funding two different funding modes exist: first, the public basic funding, also called GUF (General University Funds). General University Funds represents more or less automatic public transfer funds (‘block grants’) to the universities. Besides GUF transfers for university research there are also GUF payments for other university activities (for example, teaching or services). The second public funding mode is ‘earmarked’, and implies that specific and well-defined university research is financed (OECD, 1989, pp. 44−46). Normally this will be university research, organized in the context of research projects or research programs. Research programs can consist as an aggregation of individual research projects or may include other specific initiatives, such as the (temporary) funding of ‘research centers’ or ‘research networks’. The OECD classifies this second public funding mode as ‘direct government’ funding (OECD, 2000b, Table 7.1). For this earmarked funding we will use the term of ‘P&P (projects and programs) funding’, by defining P&P as university research that is financed in the context of projects or programs.2 Non-public funding of university research can be generally regarded as P&P funding and may be disaggregated into the following two components: private P&P, combining funding from the business enterprise and the private non-profit (PNP)3 sectors; and foreign P&P (‘funds from abroad’). An empirical comparison of the funding base of university research in several European countries demonstrates the dominance of public funding and the fact that GUF still represents the most important single funding component (see Figure 7.3). However, the extent of GUF funding can vary considerably from country to country: it ranks high in Austria and the Netherlands, relatively high in Germany and Switzerland, somewhat lower in Finland and France, and clearly lower in the UK. In addition, there is a general recent trend of a relative decline in GUF funding, implying that university research is increasingly funded through public and non-public P&P (see Figure 7.4). Projects- and programs-funded university research already is exposed to mechanisms of a twofold or threefold quality control: ‘ex-ante’, during the process of application (for projects or programs); ‘ex-post’, after the completion (of projects or programs); and sometimes also ‘in parallel’ (for example, most research programs have installed processes of continuously monitoring research activities). ‘In parallel’ quality control can of course always be conceptualized as ‘ex-post’ (by subdividing the whole duration of a program into shorter intervals). Therefore, by definition, P&P funded university research is already evaluated university research.

104

Learning from science and technology policy evaluation Public basic (GUF) funding Public P&P funding Private (business + PNP) P&P funding Foreign P&P funding Other

Austria (1993)

6.47

Netherlands (1998)

3.58 0.27

Germany (2000)

82.75

14.46

2.36 0.42

76.32 13.36

10.55

2.01

65.91

21.53

63.59

16.74 10.87

Switzerland (1998)

8.8

Finland (1998)

39.95

5.62 5.21 1.11

France (1998)

46.08 42.84

3.73 2.85 4.5

UK (1998)

22.5

9.04 4.01

0

10

48.11

20

28.75

30

35.7

40

50

60

70

80

90

Funding shares – percentage of total funding Source: Author’s own calculations based on OECD (2000b).

Figure 7.3 Funding of university research in Europe through public basic funding (GUF) and through project and program-based funding (P&P) The GUF funded university research, on the contrary, is not controlled ‘exante’ and resembles, metaphorically speaking, prospectively something like a ‘black box’ (Felderer and Campbell, 1994, pp. 210−11). If there is an interest in assessing GUF-based university research, then ‘ex-post’ evaluations represent a viable and crucial approach. Of course there are good arguments for sustaining a substantial element of public transfer funding for university

The evaluation of university research

105

1996–2000 (mean average) 1991–1995 (mean average) 1986–1990 (mean average)

Austria

Finland

France

82.75 86.36 49.13 58.45 56.79 45.27 45.46 47.07 66.51 70.54

Germany

78.46 86.41 88.45

Netherlands

68.42 73.71 68.05

Switzerland

UK

36.07 42.13 50.43

0 20 40 60 80 100 Average funding shares of GUF – percentage of total funding for each period Source: Author’s own calculations based on OECD (2000b).

Figure 7.4 Funding of university research in Europe through public basic funding (GUF) during the periods 1986− 90, 1991− 95 and 1996−2000

106

Learning from science and technology policy evaluation

research, stressing that GUF fulfills the following functions: supports basic research (also with a long-term perspective, perhaps with no foreseeable near-application potential); enables researchers to perform ‘blue sky’ and ‘curiosity-driven’ research activities; hopefully emphasizes the linkage between research and teaching (Campbell and Felderer, 1997, pp. 56−574); be considered as an important ‘cultural element’ for academic ‘intellectual freedom’, at least in Europe. However, derived from the existence and importance of GUF-funded basic university research one can claim that there is a need for comprehensive institutional ex-post evaluations of university research, emphasizing that the whole university sector of a country should be addressed. And of course our theoretical expectations are that the greater the GUF funding component, the more there is a demand for such institutional ex-post research evaluations. CONCEPTUAL AND METHODIC CONSIDERATIONS FOR THE EVALUATION OF UNIVERSITY RESEARCH There exist at least two conceptual possibilities for representing and measuring university research and the quality of university research (see Figure 7.5). These are: indicators and/or evaluations. Indicators and indicator systems: one objective of indicators is to represent university research on the basis of ‘quantitative’ data or information.5 This quantitative representation opens up a wide spectrum for sophisticated analyses, using statistics and (computer supported) graphics. Key output indicators for university research are: publications; P&P funded university research; and patents. Research input indicators focus on financial investments and (research) personnel; personnel input again can be expressed in ‘temporal units’, that is full-time equivalents (FTE). Complex comparisons of research input and output again allow assessing research efficiency (for a detailed discussion of indicators in the sciences see Hornbostel, 1997). Indicators can be used in a routine mode and permanently, by implementing indicator-based ‘monitoring systems’. Once established, such monitoring systems may operate inexpensively. Public funding formulas in Germany, for example, frequently use indicators (Ziegele, 2000). Evaluations and evaluation systems: evaluations represent the second option for measuring and judging research and research quality. Methodically evaluations can apply two approaches: peer review, with or without indicators. The simple definition of peers would be that they are ‘experts’, for example scientists evaluating other scientists. Peers refer to a wide spectrum of information to make their judgments. Normally they also use indicator information. Thus evaluations mostly imply a complex interaction of peer review and indicators. The existence of indicators (and

The evaluation of university research

107

indicator systems) obviously supports peer review activity. On the other hand, peer review can lead to the development of indicators. The critical assessment and further improvement of indicators usually will demand peerreview involvement. Evaluations can be processed either as ‘external evaluations’ or as (internal) ‘self-evaluations’ (Röbbecke and Simon, 2001, p. 293). In the case of self-evaluations the scientists of those institutions can ‘simulate’ peer activity. The implementation of a variety of evaluation initiatives allows speaking of the existence of an (advanced) evaluation system (see also Bozeman and Melkers, 1993; Kuhlmann, 1998). Conceptual and methodic possibilities:

Evaluations (with or without indicators)

Indicators (without evaluations)

Possible methodic approaches of evaluations (a):

Peer review: judgment derived from expert opinion (with or without indicators)

Indicators: judgment (peer review judgment) derived from quantitative data (quantitative information)

Possible methodic approaches of evaluations (b):

External evaluations (peer review and/or indicators)

(Internal) Selfevaluations (simulated peer review and/or indicators)

Strenths of:

Peer review: complexity

Indicators: objectivity

Weaknesses of:

Peer review: subjectivity

Indicators: superficiality

Source: Author’s own conceptualization based on Campbell (1999).

Figure 7.5 Possibilities and approaches of representing and measuring university research and the quality of university research With regard to the methodic approaches of peer review and indicators it is important to keep in mind that each of those methods has its implicit strengths and weaknesses. This strength/weakness dichotomy can be expressed, in the case of peer review, with the terms of ‘complexity’ and

108

Learning from science and technology policy evaluation

‘subjectivity’; and in the case of indicators with ‘objectivity’ and ‘superficiality’ (Campbell, 1999, pp. 374−75; Raan, 1995; see again Figure 7.5). 1. Peer review strength − complexity: the information, that peers or expert panels take into account and integrate into their assessment, is much broader and saliently comprehensive than indicator-based information. Experts can conduct analyses of a higher complexity than pure indicator systems. 2. Peer review weakness − subjectivity: peer review potentially suffers from the problem that the outcome of a peer review process is determined (or even biased) by the specific composition of peer panels, their ‘subjective’ preferences. Peers frequently are established scientists, thus we are confronted with the problem that innovative research, carried out by young or non-established scientists, may be undervalued. There also exist phenomena of mutual dependency and ‘old-boy networks’ among scientific communities. It is pivotal to find balancing criteria (and control mechanisms) for the selection of peers, although there is nothing like a perfect selection process. 3. Indicator strength − objectivity: one crucial demand, put on indicators, is that they represent university research in the mode of ‘quantitative’ data or information. Thus indicators allow that university research is ‘counted’ or ‘measured numerically’. This fulfills demands of ‘intersubjective’ (or even ‘objective’?) validation, since results of the counting process should be independent of those who do the counting. 4. Indicator weakness − superficiality: how do we know that the information, which is measured by indicators, should be regarded as the important and relevant information? In addition, there often is a real problem of actually validating and confirming the ‘quantitative’ data information. For example, it is extremely difficult (and time-consuming) to collect publication data in a comparable and standardized format. Therefore it appears recommendable to expose existing indicator systems to critical peer-review judgment and to look for ongoing improvements. In practical policy terms, peer review and indicator systems mostly will be combined. Still it can be observed that, for instance in the context of institutional ex-post research evaluations, the peer review represents the dominant approach, in which peers also use indicator information. One explanation for this may be that we are still more willing to believe in expert judgment that in ‘plain’ indicator systems. Searching for a general definition for the evaluation of university research, we can propose a focused characterization: evaluations are processes, using the methods of peer review or of peer review and indicators, for the objectives of analyzing, measuring

The evaluation of university research

109

and judging the quality of university research, which finally should improve the quality of university research.6 In addition, evaluations of university research should fulfill the following functions (Campbell, 1999, pp. 370−72; Campbell and Felderer, 1997, pp. 68−75): 1. implement complex and sophisticated feedback mechanisms into the university systems, maybe comparable to what democracy does for the political system and the ‘market’ for economic systems (evaluations gradually convert universities from ‘black boxes’ into ‘white boxes’); 2. help to create and further an ‘academic market’, by emphasizing market or market-similar principles, however adapted to academic needs; 3. support the improvement of the ‘rationality’ and decision-making of university systems, since evaluation results may be used partially as criteria and references for resource allocation, the promotion of individual careers and organizational reform (‘institutional learning’); 4. legitimate the use of public resources — particularly of public basic funding (GUF) − by the universities (university systems), vis-à-vis the public founders and decision-makers, and the public and society in general; evaluations emphasize transparency and accountability. Concerning the ‘dimensionality’ of university research quality, different conceptualizations are possible. One approach is to define only one dimension, simply termed ‘quality’. The UK followed that evaluation policy path, using one comprehensive quality dimension. Alternatively, more than one, that is several quality dimensions can be defined. A multidimensionality may be based on ‘quality’, ‘efficiency’, ‘relevance’, and ‘viability’. Within such a conceptualization, quality then refers to scientific (academic) quality more focused, by assessing, for example, how new or innovative university research is and critically reviewing the quality of the publications that contain the research results. Efficiency again relates research input to research output. Relevance can have at least two meanings: research being relevant for other university (or academic) research in the sciences; or research with a high application impact on society and technological progress. Viability assesses the organizational context of university research: for example, whether university departments define mission statements with explicit research goals and also implement measures (for example, personnel development plans), which refer directly to these research objectives; furthermore the concept of viability reflects if university departments develop criteria that act as a crucial frame of reference (benchmark), against which the degree of realization of the self-defined research objectives is tested. Contrary to the UK, the Netherlands opted for a research evaluation model that uses four quality dimensions (see Figure 7.6.; see also Campbell, 1999, pp. 375−76).

110

Learning from science and technology policy evaluation

Quality, efficiency, relevance, and viability are understandable as ‘firstlevel’ quality dimensions, directly measured and judged upon during individual evaluation procedures. On top of these one can also define ‘second-level’ or ‘meta-level’ dimensions of research quality. Effectiveness may serve as an example for a ‘higher’ or ‘advanced’ dimension, focusing on the question: how effective is university research? Effectiveness often is used as a policy term, and should express the degree of achievement of certain (research) objectives. Still, in practice, a consistent operationalization (application) of that concept often proves difficult. Within our line of argument the ‘effectiveness’ may be modeled as a combined derivation of ‘first-level’ dimensions, allowing the statement of specific and distinct effectiveness profiles for various institutions (or disciplines).7 Consequently, different ‘effectivenesses’ arise: some institutions might do better concerning efficiency, others perhaps demonstrate saliency with regard to relevance. Another example for a second-level dimension represents the question, to which degree research quality impacts the organizational framework of universities and leads to organizational (institutional) improvement (see, once more, Figure 7.6). The existence of a comprehensive evaluation system, with numerous evaluations being carried out, obviously supports ambitions to include second-level quality dimensions into evaluative analyses. AN EMPIRICAL TYPOLOGY OF COMPREHENSIVE INSTITUTIONAL EX-POST EVALUATION OF UNIVERSITY RESEARCH IN EUROPE If countries are clustered with regard to the ‘comprehensiveness’ of university research evaluation, then the following typology can be proposed for discussion: the United Kingdom and the Netherlands may be classified as Type A, and Germany and Austria as Type B. Crucial in that distinction is that in the UK and the Netherlands there already exists an evaluation system for university research that applies comprehensive institutional ex-post evaluations, whereas in Germany and Austria it does not. We already elaborated, why that particular evaluation approach perfectly fits to the funding mode of public basic funding, called GUF. Important features of Type A — relevant for the UK and the Netherlands — are (see Figure 7.7. and see also Campbell, 1998, p. 17; see furthermore HEFCs, 2001; VSNU, 1998; for other classification possibilities see Geuna, Hidayat and Martin, 1999): 1. The ex-post evaluations cover the whole university system at national level, thus it expresses a broad (‘maximum’) comprehensiveness. Universities in the UK, which would not participate in these evaluations, automatically are excluded from public basic funding for research.

The evaluation of university research

111

Conceptual typology of quality dimensions of university research: First-level quality dimensions:

Second-level (or meta-level) quality dimensions:

Quality

Effectiveness (how effective?)

Efficiency

Organizational (institutional) improvement of universities

Relevance

(Evolutionary) Mid-term and long-term increase of research quality

Viability

Co-evolution of research quality and research evaluation

Empirical typology of quality dimensions of university research: First-level quality dimensions in the UK and the Netherlands: The UK model of institutional ex-post evaluation of university research: one (comprehensive) dimension of research quality: Quality The Netherlands model of institutional ex-post evaluation of university research: four dimensions of research quality. Academic quality Academic productivity Relevance (Long-term) Academic viability Source: Author’s own conceptualization based on Campbell (1999).

Figure 7.6 Typology of different dimensions of research quality

112

Learning from science and technology policy evaluation

Type A countries (nations) United Kingdom (UK) Netherlands

Type B countries (nations) Germany Austria [Finland (1990)?]

Systematic and comprehensive evaluations, at national level and across all disciplines, with explicit references to grading scales: ‘systemic and comprehensive approach’; disciplinary-based institutional expost research evaluations. Individual and disciplinarily independent evaluations, and without (frequent) references to explicit grading scales: ‘pluralized and situational approach’.

[Switzerland (1990)?] Source: Author’s own conceptualization based on Campbell (1998).

Figure 7.7 Comparative typology of university research evaluation in Europe with regard to the comprehensiveness of institutional expost evaluations 2. For the purpose of ex-post evaluation a specific number of disciplines (subjects or fields) will be defined: per discipline usually one expert panel is installed, and all university institutions (university departments) are assigned to those disciplines. The expert panels then judge (evaluate) the quality of university research for those departments, for which they are responsible according to the disciplinary logic. 3. Methodically these evaluations represent a peer-review approach, in which peers also use indicators (indicator information). 4. As a part of the evaluation procedure, the panels (peers) also judge the research quality with an explicit reference to a ‘rating scale’: university research, therefore, is ‘numerically graded’. 5. The ex-post evaluations in the UK and the Netherlands may be characterized as disciplinary based institutional research evaluations, in which universities are referred to as a disciplinary matrix. Put in summary, these evaluations follow a ‘systemic and comprehensive approach’.

The evaluation of university research

113

Ex-post evaluations in the UK are called Research Assessment Exercises (RAEs). Four RAEs already were carried out (1986, 1989, 1992, and 1996), a fifth RAE currently is being conducted. This 2001 RAE finally should be completed in 2002. From the 1989 RAE onwards, basically the same evaluation methods are used. In the Netherlands the first ex-post evaluation cycle, also called Research Assessment, already is completed. Currently a second ex-post evaluation already is under way, which expectedly should be finished by 2004. These comprehensive institutional ex-post research evaluations represent the main evaluation activities of university research in the UK and the Netherlands, that is, define the primary evaluative frame of reference: in addition, of course, also other evaluation initiatives are carried out individually at sub-national level that may also address university research in some respect. In Germany and Austria such a system of comprehensive institutional expost evaluation of university research at national level does not exist, for the moment, and it remains an open policy question, if something like this is desirable and should be implemented in the future. In that respect the Type B cluster defines itself very much in contrast to Type A, in the sense that a specific set of university research evaluations is missing. Type B applies a ‘pluralized and situational approach’, with a variety of individual and ‘ad hoc’ evaluations, often at a sub-national level or ‘meso-level’. Evaluations, also disciplinary evaluations, are carried out independently, without a common and cross-disciplinary evaluation standardization at national level. Furthermore, university research evaluations in Germany and Austria normally refuse to grade research quality on a rating scale. If there is an interest to apply the Type A and Type B distinction to other European countries, then nations like Finland and Switzerland probably, at least during the period of the 1990s, would qualify for the Type B cluster. In both countries an extensive evaluation system operates, under certain considerations clearly more systematic than in Germany or Austria. However, in Finland as well as in Switzerland a common and well-defined conceptual and methodic frame of reference for university research evaluation, as comprehensive and systemic as in the UK and the Netherlands, still is not in place. Finland can refer to an extensive evaluation experience with regard to the development of indicator systems, the evaluation of specific research fields (research disciplines) and comprehensive institutional evaluations of individual universities (Academy of Finland, 1997; Liuhanen, 2001; Ministry of Education, 1993a; 1993b; 2001). In Switzerland also several disciplines were evaluated, such as physics, the social sciences and humanities (SWR, 1993; 1995; 1998). In addition, Switzerland developed a sophisticated expertise with regard to bibliometric indicators, that is article publications in international journals (SWR, 1992; 1999). However, to

114

Learning from science and technology policy evaluation

classify Finland and Switzerland as Type A countries, the following conditions would have to be fulfilled: the evaluation of research fields or disciplines must cover the whole institutional spectrum of the university systems; there must be a common methodic standardization for all individual discipline-based institutional evaluations; and the temporal duration for a complete ‘evaluation cycle’ of all research fields or disciplines may not exceed a limited number of years. On the other hand, the applied principle of research field or disciplinary based evaluations clearly marks a conceptual interface, according to which Finland and Switzerland could convert into Type A countries. COMPARISON OF THE METHODS OF COMPREHENSIVE INSTITUTIONAL EX-POST RESEARCH EVALUATIONS IN THE UNITED KINGDOM AND THE NETHERLANDS In the following, the methodic details of the institutional ex-post research evaluations in the UK and the Netherlands are comparatively described and summarized. Institutional Responsibility and Supervision of the Evaluations UK: the RAEs are organized by the Higher Education Funding Councils (HEFCs). Of those exist four, one in England, Wales, Scotland, and Northern Ireland. The HEFCs can be characterized as ‘intermediary’ public institutions, responsible for the university system and the public basic funding (GUF) of university research as well as other university activities (for example, teaching). Thus the HEFCs are located ‘between’ the government (ministries) and the higher education sector. Netherlands: the ex-post evaluations are organized by VSNU (‘Vereniging van Samenwerkende Nederlandse Universiteiten’), an association of the fourteen major Netherlands universities. The VSNU can be characterized as an association, which is ‘part’ of the university system. Comment: systemically speaking, the institutional ex-post research evaluations in the Netherlands resemble more something like a ‘selfevaluation’ of the university system. Dutch universities were granted an extended autonomy during the 1990s, and the government continues generous public (basic) funding. This, however, was coupled to the expectation that universities engage in comprehensive evaluations. In the UK the ex-post research evaluations are linked more closely to public institutions and the public decision-making concerning funding.

The evaluation of university research

115

Temporal Duration of Evaluation Cycles UK: approximately one year (or even less than one year, with regard to the core evaluation procedure). Netherlands: approximately seven years (always overlapping groups of disciplines are evaluated); the first evaluation cycle was carried out 1993−98, the second is scheduled for 1998−2004. Number of Disciplines UK: the disciplines are called ‘units of assessment’. During the history of RAEs a continuous reduction of disciplines can be observed (1989 RAE: 92 disciplines; 1992 RAE: 72; 1996 RAE: 69). Currently, in the context of the 2001 RAE, there are 68 so-called units of assessment. A too high number of disciplines can produce institutional assignment problems. Netherlands: constantly 34 disciplines (during both evaluation cycles). Comment: Does this assignment of university institutions (departments) to different disciplines not resemble a ‘conservative’ approach, which is at potential conflict with demands like inter-, multi- or trans-disciplinarity (see Gibbons et al., 1994, pp. 17−45)? One reason for carrying out disciplinebased institutional assignments can be derived from the experience that within each discipline there exist specific conditions and patterns of research (see Daniel, 1995). These specific (‘historical’) patterns should, of course, be explicitly reflected during processes of peer review. In such an understanding ‘inter- or transdisciplinarity’ are integrated as important ‘elements’ into the evaluation procedure, used as key references for judging the quality of university research. The question to be answered by peer-based ex-post evaluations is: does university research, conventionally organized in a disciplinary manner, also express an inter- or transdisciplinary profile? Smallest Institutional Unit for Research Evaluation UK: the university departments (of the universities). Netherlands: the research programs; a ‘university department’ normally consists of several research programs, some research programs however cross-cut the boundaries of departments. Research at Dutch universities conventionally is organized in modules of research programs. Comment: in the Netherlands the smallest institutional units for evaluation are located at a ‘lower’ institutional level than in the UK. The fact that the Netherlands are a small- to medium-sized country, perhaps favored such a more disaggregated evaluation approach. For other small-sized European countries or sub-national units (such as the Länder in Germany) this might be of particular interest.

116

Learning from science and technology policy evaluation

Characteristics of the Peer Panels per Discipline UK: the peer panels, installed per discipline, are called ‘Assessment Panels’. On average a panel exists of nine to 18 experts, and they are drafted from a broad spectrum of institutions. Sometimes also sub-panels additionally are installed. Current panel chairs were nominated from the panel members of the 1996 RAE, and became commonly appointed by the HEFCs. Important ‘balancing criteria’ for panel membership are: covering different UK regions, and coming from so-called ‘old’ and ‘new’ universities. Netherlands: the discipline-based peer panels are called ‘Review Committees’. The panel chair normally is Dutch, decided upon by a consensus-oriented selection process, in which the institutional academic key actors are involved. A majority of panel members is drafted from outside of the Netherlands. For most disciplines the core evaluation procedure is carried out in English. Comment: in the UK as well as in the Netherlands the general procedure and the main applied methods of the institutional ex-post research evaluations are standardized across all disciplines and published prior to the beginning of the evaluation (HEFCs, 1995a; 1999; 2001; VSNU, 1994; 1998). Disciplinary based peer panels can adapt and specify the evaluation methods, but there must not be a fundamental conflict or contradiction with the general methodology (see, for example, HEFCs, 1995b). On the one hand, this guarantees ‘horizontal’ compatibility, so that evaluation results of different disciplines (and university institutions) can be compared. But also ‘vertical’ (or ‘temporal’) compatibility is supported, so that results of different evaluation cycles may be related and developments over time become visible. Panel membership in the UK conventionally is British. In the Netherlands, on the contrary, it was decided explicitly to draft primarily international peers: evaluation procedure and communication based mainly on English, evaluation documents are in English or bilingual. Exceptions are only granted for ‘domestically oriented’ disciplines, where evaluations may be performed in Dutch. Content of Submissions to the Peer Panels UK: general information and ‘up to four items of research output for each researcher’ (HEFCs, 2001, p. 4) for a predefined period of years. Netherlands: general information and ‘the five best academic key publications’ (VSNU, 1998, p. 33) per research program for a predefined period of years. Comment: each ‘smallest institutional unit’, depending on its disciplinary assignment, must report the requested information in form of submissions to the responsible peer panel. In the case of the UK the general information

The evaluation of university research

117

includes ‘staff information’, ‘research output’, ‘textual descriptions’, and ‘related data’ (see again HEFCs, 2001). Extremely important for the whole RAEs are those four best items of research output per researcher, which often (or even mostly) will be publications, but are not necessarily restricted to these. Every research output, accessible to the public, qualifies. Researchers and university departments themselves decide what their best research output is. This prevents the ex-post evaluation in the UK being primarily oriented towards quantitative indicators, so that the actual peer review and the final judgment of the peers dominate the whole evaluation process. A convincing standardization of publication indicators often is difficult; however, not impossible (see again Figure 7.5). In the Netherlands also more aggregated institutional units than the research programs, such as the institutes (departments) or faculties, report and forward information to the panels. Concerning research input, the Dutch place a major emphasis on the different funding modes: public basic funding and P&P funding (public, private and foreign). With regard to some categories, dissertations (PhD theses) also qualify as ‘research output’. Thus demands for the interconnectedness of research and teaching become stressed (VSNU, 1998, pp. 25-40). Number of Dimensions of the Quality of University Research UK: one quality dimension. Netherlands: four quality dimensions; these are ‘academic quality’, ‘academic productivity’, ‘relevance’, and ‘academic viability’. Comment: the fact that in the UK the P&P funding (non-GUF) dominates university research (see the Figures 7.3 and 7.4) might serve as an explanation, why the ex-post evaluations focus so much on a ‘core’ concept of ‘quality’. Quality is understood as ‘overall quality of the research’ (HEFCs, 2001, p. 5). Contrary to the UK, in the Netherlands still a substantial majority of university research is financed through public basic funding (GUF). Therefore the specific assessment of productivity (efficiency) and relevance makes in the Dutch case particular sense (see again Figure 7.6). However, also from a perspective of epistemology this multidimensional definition of university research quality, as applied in the Netherlands, is interesting, since it highlights the involved complexity. Rating Scales for the Quality of University Research UK: seven-point rating scale; 1, 2, 3b, 3a, 4, 5, and 5*, with 5* being the best grading. Netherlands: five-point rating scale; 1 (poor), 2 (unsatisfactory), 3 (satisfactory/average), 4 (good), and 5 (excellent).

118

Learning from science and technology policy evaluation

Comment: in the United Kingdom a certain trend manifested that the rating of some disciplines improved with each new RAE being carried out. Since this involved the potential of a ‘ceiling’ effect for the better performing university departments, it was decided to broaden the rating scale. On top of 5 a 5* was added, and 3 became subdivided into 3b and 3a. This obviously provokes the question, whether increases in quality ratings during the process of sequential RAEs also should be interpreted as a ‘real’ quality progress of university research. With regard to the Netherlands, the evaluation procedure demands that the dimensions of quality and viability always are referred to those rating scales. Productivity and relevance may be excluded from a numerical grading, which, however, only seldom is the case (VSNU, 1998, p. 8). Evaluation Reports of the Peer Panels UK: a comprehensive document is published that reports for each university department the specific rating by the expert panels (‘assessment panels’). That document also contains overviews of academic staff (see HEFCs, 1996). During the course of the 2001 RAE also a specific report for each discipline (unit of assessment) will be released, and confidential ‘feedback reports’ will be forwarded to the university institutions (HEFCs, 2001, pp. 67). Netherlands: for each covered discipline a comprehensive report is published that contains the conclusions of the expert panels (‘review committees’). Those reports address the following issues: description of the evaluation procedure and the terms of reference; assessment of the discipline at national level; assessments of the discipline at the levels of universities or faculties; and detailed reporting on the individual research programs. For each research program the multidimensional rating and the specific recommendations are made public. During the currently conducted second evaluation cycle in some disciplines the program-based education also is assessed and graded (using, for example, a 10-point rating scale). Disciplines already covered by the second evaluation cycle are philosophy, mechanical engineering, environmental sciences, and marine technology (VSNU, 2000a; 2000b; 2000c; 2000d). Comment: earlier RAEs had a tendency of focusing on publishing the rating results. This may be explained by the function that in the UK the results of those institutional ex-post research evaluations directly impact the public basic funding of university research. Later RAEs appear to become broader in their scope, by emphasizing also recommendations. Ex-post evaluations in the Netherlands always explicitly stressed the formulation of recommendations and analyzed research quality in a comprehensive (institutional) context. Whether evaluations of university research also

The evaluation of university research

119

should address automatically the quality of program-based teaching (‘study programs’), seems to become an issue of greater importance for the Netherlands. Direct and Formal Consequences of Evaluation Result for the Public Funding of University Research UK: there is a direct and formal linkage between the ex-post evaluation results and the public basic funding (GUF) of university research, since the evaluation outcome flows directly into a funding formula for QR (‘qualityrelated research’). Simplified, the QR formula is: ‘Amount = Quality x Volume’ (HEFCE, 1996, p. 16; 1997, p. 16). The volume indicators refer primarily to ‘research active academic staff’ and to a significantly lesser extent also to research assistants, research fellows, postgraduate research students, and research income from charities. Quality reflects the quality rating of the departments, according to the assessment by the expert panels. Departments that only received a research rating of ‘1’ or ‘2’ are omitted from further public basic funding. Furthermore disciplinary-specific ‘cost weights’ are introduced that determine the total funding amount for each discipline. In a bottom-up procedure a ‘virtual’ QR funding amount is calculated per university department by multiplying the quality with the volume indicators, which again are added together for a total for each university. The university then receives an aggregated QR ‘funding amount’ by the HEFCs (HEFCE, 2000a, pp. 16−19; HEFCs, 2001, p. 6). Of the total public basic funding of university research, called GUF, the biggest portion is determined by QR. For example for England, and the funding policy of the HEFCE (Higher Education Funding Council for England), evaluation results impacted through QR the following shares of GUF in the following academic years:8 1993−94 94.7 per cent; 1996−97 94.4 per cent; 1997−98 97.2 per cent; and 1999−2000 97.7 per cent (HEFCE, 1993, p. 27; 1996, p. 15; 1997, p. 15; 2000a, p. 16). Netherlands: there is no direct and formal linkage between the ex-post evaluation results and the public basic funding (GUF) of university research. However, evaluation results impact the university-internal decision-making: funding for research programs that only scored a ‘3’, or even less, may be cut back. When researchers or university departments apply for specifically ‘expensive’ P&P funding, like major research programs or the implementation of ‘centers of excellency’, their past rating can also be taken into consideration by the funding bodies. A study on the effects of those expost evaluations proposes the following findings (Westerheijden, 1997): first, there is a rising (‘self-reflexive’) awareness for research and research quality; the managerial skills at Netherlands universities improve; ‘informal reputation’ becomes more visible; but there is also an increased ‘climate of

120

Learning from science and technology policy evaluation

competitiveness’ between the researchers (see also Rip and Meulen, 1995, pp. 50−51). Comment: derived from pure analytical considerations, the UK evaluation model expresses a certain fascination, since of the European countries covered by our analysis it represents the most comprehensive approach by directly linking evaluation outcome with public basic funding. Perhaps also in a global context the UK research evaluation system qualifies as saliently comprehensive, and the first UK RAE was carried out about six years in advance of the launch of the first evaluation cycle in the Netherlands. At the same time there exists some reluctance among academic and particularly university communities in other Central European countries, to have such a systematic linkage between evaluation outcome and public funding implemented. From the beginning, the institutional ex-post research evaluations in the Netherlands focused more on emphasizing and stimulating processes of institutional learning and ‘self-learning’ of the universities. Of course, also Dutch evaluation results influence university-internal and university-external decision-making. However, they are more used as references for supporting strategies of university reorganization, and are not directly linked to a public (basic) funding formula. Thus the Netherlands evaluation model clearly is ‘moderate’, and this may partially explain its ‘attractiveness’ and its influence on evaluation discourses in Germany and Austria. Perhaps also for the UK, in the future, considerations of supporting institutional self-learning will gain importance on the agenda of research evaluation. THE EVALUATION OF UNIVERSITY RESEARCH IN GERMANY AND AUSTRIA In Germany and Austria a system of comprehensive institutional ex-post research evaluation, addressing at national level all of the disciplines, currently is not in place. With regard to Germany several structural and cultural factors can be stressed that constrained the development of such comprehensive evaluations (see Campbell and Felderer, 1997, pp. 52−62). In Germany the primary competence for the universities and their basic public funding (GUF) is not a domain of the federal government, but represents a responsibility of the sub-national governments at Länder level, the so-called ‘federal states’. Germany is composed of 16 Länder, therefore there coexist in parallel different frames of reference concerning public policy vis-à-vis the universities (for an overview, see also Müller-Böling, 1995). On the one hand this offers the opportunity of ‘pluralism’ and allows for ‘experiments’ with different evaluation models. At the same time, however, there occur potentially problems of incompatibility between different Länder-based

The evaluation of university research

121

models of university research evaluation. German university research evaluation therefore is confronted with the challenge of how to realize a minimum compatibility between Länder evaluations. Such standardizations of compatibility would allow ‘horizontal’ (across Länder) and ‘temporal’ (across successive years) comparisons of research quality. In Germany, expressed in R&D expenditure, the university-related research (‘außeruniversitäre Forschung’) generates a financial volume, which almost equals university research (OECD, 2000a, pp. 22−23). One can claim that the German university-related research already has been exposed more systematically to research evaluations than German university research (see Krull, 1994, p. 206; 1999; Kuhlmann, 1998, p. 87). Since the federal government, in co-operation with the Länder, also is responsible for the basic public funding of university-related research institutions (BMBF, 1998b, p. 8), this specific institutional setting supported the national comprehensiveness of university-related evaluations. German unification mobilized a major push for evaluations, as the specific challenge arose of integrating the former East German academy research institutes into the new all-German framework of academic research. The ‘West German’ Science Council (‘Wissenschaftsrat’) organized a systematic evaluation of those East Germany academy institutes during the years 1990−91: as a result academy institutes were either dissolved or could continue to operate, then often converted into institutes of the so-called Blue List (now renamed to Leibniz institutes). The Blue List represents an important segment of Germany’s university-related research, together with the Max Planck Society, Helmholtz Centers, and Fraunhofer Society. There existed already in the past the tradition of evaluating individual research institutes of the Blue List. Since the Blue List represented an ‘institutional winner’ of unification, by taking over substantial parts of the East German academy complex, it was decided in the mid-1990s to expose the whole Blue List to a comprehensive institutional evaluation that also focused specifically on research (Campbell and Felderer, 1997, pp. 97−106). This evaluation cycle, which emphasized methodically the combination of self-evaluations and external evaluations, was completed in 2000 (see again Röbbecke and Simon, 2001). In connection with that comprehensive evaluation it was also discussed about ‘redefining’ the Blue List as a (university-related) ‘research platform’, funded by the public: however, based on performance and evaluation results the institutional membership to that platform would be flexible; that is institutes could apply and become members, whereas other institutes might again loose their membership status. A major emphasis of university reform in Germany focuses on developing comprehensive formulas for the public funding of universities, which rely on indicators and refer to the output profile of universities. Indicators and/or evaluations represent conceptual possibilities for ‘measuring’ university

122

Learning from science and technology policy evaluation

research (see again Chapter 3). According to Ziegele (2000), currently there exist four different models for performance-oriented public funding formulas of universities in Germany: pure indicator models; indicator models in combination with agreements on specific goals; extending already existing funding schemes (‘historical input-orientation’) by agreements on specific goals; extending agreements on specific goals by indicator-based incentives. These models refer to each other competitively, and different Länder opt for different models. Clearly, those funding formulas do not only address university research, but structure the public funding of the whole university, covering all university activities. However, indicators used in the contemporary context of formula funding focus more on teaching (and education), and less on research. The extent, according to which the total public Länder funding is determined by these indicator-based formulas, still varies substantially. In parallel to public funding formulas, some German universities also implemented university-internal reallocation schemes that redistribute some of the university funds, based on the performance of university departments and faculties. Involved performance measures rely on teaching and research indicators. The Free University of Berlin pushed forward with such a university-internal redistribution of resources already in the early 1990s (Campbell and Felderer, 1997, pp. 86−91; see also Wex, 1995). Baden-Württemberg, one of the German Länder, decided to implement – beginning with the year 2000 – a comprehensive evaluation system of teaching and research that addresses the domestic universities. In accordance with the Netherlands model, these evaluations should cover all of BadenWürttemberg and be carried out cyclically. A specifically installed evaluation agency will be responsible (CHE, 2000, pp. 7, 35−37). Lower Saxony (‘Niedersachsen’) recently also decided to apply a comprehensive evaluation model of university research, addressing the whole disciplinary spectrum on the methodic basis of ‘informed peer-review’. Specific dimensions of the research evaluation will be: quality and relevance; effectiveness and efficiency; and issues with regard to ‘structural’ policy. University-related research institutes may also take part in these exercises (Wissenschaftliche Kommission Niedersachsen, 2000). At the same time there exists a certain reluctance in Germany to use evaluation results as indicators for the public funding formulas of universities (CHE, 2000, p. 38). Within that logic research evaluations should support institutional learning processes of universities, but may not be referred to for broadly determining public basic funding. In Austria several research evaluations were already carried out. The first discipline covered comprehensively at national level was physics in the early 1990s. Disciplines that followed were electronic engineering (‘Elektrotechnik’) and biochemistry. Also the Vienna University of Veterinary

The evaluation of university research

123

Medicine was subjected to a detailed institutional evaluation. The current situation can be described as a ‘pluralized and situational approach’, with a sequence of different evaluations. A system of comprehensive national expost research evaluation, however, has not been implemented for the universities (or for university-related research). The current evaluation debate therefore also focuses on the question, whether there is a need for evaluations that cover all of the disciplines comparatively. Some policy studies highly recommend for Austria the implementation of ‘systemic’ evaluations of university research, and plead for comprehensive models that refer to the experiences of research evaluation in the Netherlands (Campbell and Felderer, 1999, p. 12). A particular issue is, to which extent evaluation models of university research also should address university-related (‘außeruniversitäre’) research (Campbell, 1999, pp. 378−81). The contemporary Austrian university system undergoes substantial reforms. As a part of that reform process the autonomy of the universities, vis-à-vis the (funding) government, is increased. This creates demands for indicator-based public funding-formulas of universities, in which agreements on goals and performance (‘Ziel- und Leistungsvereinbarungen’) should be incorporated. Current developments in Germany, with regard to a general public formula funding of universities, are frequently referred to in Austria (see also ÖRK, 1998a; 1998b). Austria, therefore, is in an advantageous situation, in which it can observe closely evaluation − and other − trends in other European countries for the purpose of domestic policy decisions and policy implementation. CONCLUSIONS Derived from the proposed typology of Type A and Type B countries (see again Figure 7.7), there are different scenarios conceivable, how countries will relate to each other, concerning their future evaluation policies of university research. First scenario: if evaluation systems, applying comprehensive institutional ex-post research evaluations at national level, are considered as competitive and ‘evolutionary advantageous’ with regard to the functional needs of advanced, knowledge-based and research-intensive societies, then we expect a general conversion of Type B to Type A. Such an understanding implies that the UK and the Netherlands should be interpreted as being ahead of other countries in the field of university-research evaluation policies. And it would represent a ‘policy failure’, should Germany and Austria not implement successfully comprehensive ex-post research evaluations. A second scenario stresses more the idea of cultural or societal pluralism, emphasizing differences of societies also in relation to policy and research evaluation. Consequently, no policy conversion would

124

Learning from science and technology policy evaluation

result, and distinctions of Type A versus Type B continue to exist. So Germany and Austria just would be ‘different’ from the UK and the Netherlands. A third scenario finally might point at possibilities of interaction and ‘mutual overlapping’ between Type A and Type B. Implications, then, are processes of mutual learning across different countrybased policy models. Of course there also can be processes of policy-learning within one cluster Type, for example between the United Kingdom and the Netherlands. Combinations of various scenarios could also be considered (see Figure 7.8). With the successive application of comprehensive institutional ex-post research evaluations in some countries, of course questions arise that address the mid-term or long-term effects on the quality of university research. Can an increase of research quality, modestly or substantially, be observed over time? In parallel, also the evaluation systems themselves are subjected to analysis, testing whether there operates a systematic ‘evolution of evaluation’. When results of university research evaluation directly impact the public basic funding of universities, as is the case in the United Kingdom, this demands assessments of the long-term effects of such linkages: do these university systems continue to be competitive, in the sense that universities, with a lower quality ranking at one time, still are in a permanent position of continuously and seriously challenging the other universities? Therefore, there exists the idea that evaluation results may impact public basic funding, but may not influence too much the public P&P funding: university institutions always should have a chance of improving their research performance by successfully participating in research projects and research programs. The Higher Education Funding Council for England recently commissioned studies on the effects of the Research Assessment Exercises (RAEs): study results seem to indicate acceptance among British university communities to continue comprehensive research evaluation on the basis of peer review (HEFCE, 2000b; 2000c). Comprehensive institutional ex-post research evaluations also are interpreted as a guarantee that the mode of public basic funding (GUF) will be continued in the future, and will not be replaced by extended P&P funding. In summary, this refers to the challenging question whether there operates a ‘co-evolution’ of research and research evaluation or, more specifically, of university research and university research evaluation (see, once more, Figure 7.6).

The evaluation of university research

125

Different possible scenarios: Scenario one: functional needs of advanced societies Type A countires (nations) United Kingdom (UK) Netherlands Type B countires (nations) Germany Austria [Findland?] [Switzerland?]

Scenario two: cultural and/or societal pluralism

Type A countries (nations) United Kingdom (UK) Netherlands

Type B countries (nations) Germany Austria [Findland?] [Switzerland?]

Scenario three: interaction (mutual overlapping and learning)

Type A countries (nations) United Kingdom (UK) Netherlands

Type B countries (nations) Germany Austria [Findland?] [Switzerland?]

Source: Author’s own conceptualization.

Figure 7.8 Typology of different scenarios concerning the further development of university research evaluation in Europe ACKNOWLEDGMENT The idea of fostering the evolution of academic research systems by applying evaluations (‘evolution through evaluation’) was brilliantly elaborated by Dr Wilhelm Krull, Secretary General of the Volkswagen Foundation, in the

126

Learning from science and technology policy evaluation

context of a highly stimulating conversation in Berlin on the evening of 7 June 1999, during the conference ‘Evaluation of Science and Technology in the New Europe’. NOTES 1. The ‘national systems of innovation’ already represent a well-defined and common term for discourse (see, for example, Lundvall, 1992; Nelson, 1993). However, I would stress that it is promising to broaden that concept to ‘national (supranational and global) systems of research and innovation’ (Campbell, 2000, p. 141). 2. In German the P&P funding can be called ‘Drittmittelfinanzierung’. 3. Money from charities or foundations classifies as PNP funding. 4. Some experts argue that GUF potentially reinforces the classical Humboldtian principle of the unity of research and teaching (‘Einheit von Forschung und Lehre’); see Schimank (1995). 5. Obviously indicators can also reflect non-research activities of universities, such as teaching (education) and services. 6. In that respect we can distinguish between evaluations, combining peer review and indicators, and pure (evaluation-‘free’) indicator systems. 7. Drawing a metaphorical reference to factor analysis in statistics, effectiveness would resemble a ‘factor’, and the first-level quality dimensions ‘variables’. 8. The British academic year starts on 1 August and ends on 31 July.

REFERENCES Academy of Finland (1997), Evaluation of Electronics Research in Finland, Helsinki: Edita. Bell, Daniel ([1973] 1999), The Coming of Post-Industrial Society. A Venture in Social Forecasting, New York: Basic Books. BMBF – Bundesministerium für Bildung, Wissenschaft, Forschung und Technologie (1998a), Faktenbericht 1998 zum Bundesbericht Forschung, Bonn: BMBF. BMBF – Federal Ministry of Education and Research (1998b), Facts and Figures 1998, Bonn: BMBF. Bozeman, Barry and Julia Melkers (eds) (1993), Evaluating R&D Impacts: Methods and Practice, Boston: Kluwer Academic Publishers. Campbell, David F.J. (1998), ‘Evaluation von universitärer Forschung im europäischen Vergleich’, BUKO-Info (4), 14−17.

The evaluation of university research

127

Campbell, David F.J. (1999), ‘Evaluation universitärer Forschung. Entwicklungstrends und neue Strategiemuster für wissenschaftsbasierte Gesellschaften’, SWS-Rundschau, 39 (4), 363−83. Campbell, David F.J. (2000), ‘Forschungspolitische Trends in wissenschaftsbasierten Gesellschaften. Strategiemuster für entwickelte Wirtschaftssysteme’, Wirtschaftspolitische Blätter, 47 (2), 130-43. Campbell, David F.J. and Bernhard Felderer (1997), Evaluating Academic Research in Germany. Patterns and Policies, Vienna (Institute for Advanced Studies): Political Science Series No. 48. Campbell, David F.J. and Bernhard Felderer (1999), Empfehlungen zur Evaluation universitärer und außeruniversitärer Forschung in Österreich, Vienna (Institute for Advanced Studies): Political Science Series No. 66. CHE – Centrum für Hochschulentwicklung (2000), Hochschulreform BadenWürttemberg 2000. Stellungnahme und Empfehlungen, Gütersloh: CHE. Daniel, Hans-Dieter (1995), ‘Ist wissenschaftliche Leistung in Forschung und Lehre messbar?’, Universitas, 50 (3), 205−09. European Commission (1995), Green Paper on Innovation, Brussels: EC. European Commission (1997), Second European Report on Science and Technology Indicators 1997, Brussels: EC. Felderer, Bernhard and David F.J. Campbell (1994), Forschungsfinanzierung in Europa. Trends, Modelle, Empfehlungen für Österreich, Vienna: Manz Verlag. Geuna, Aldo, Dudi Hidayat and Ben Martin (1999), Research Allocation and Research Performance: The Assessment of Research. Study, Brighton: SPRU (University of Sussex). Gibbons, Michael, Camille Limoges, Helga Nowotny, Simon Schwartzman, Peter Scott and Martin Trow (1994), The New Production of Knowledge. The Dynamics of Science and Research in Contemporary Societies, London: Sage Publications. HEFCE – Higher Education Funding Council for England (1993), ‘Annual Report 1992−93’, Promoting Quality & Opportunity, Bristol: HEFCE. HEFCE – Higher Education Funding Council for England (1996), A Guide to Funding Higher Education in England. How the HEFCE Allocates its Funds, Bristol: HEFCE. HEFCE – Higher Education Funding Council for England (1997), Funding Higher Education in England. How the HEFCE allocated its Funds in 1997−98, Bristol: HEFCE. HEFCE – Higher Education Funding Council for England (2000a), Funding Higher Education in England. How the HEFCE Allocates its Funds, Bristol: HEFCE. HEFCE – Higher Education Funding Council for England (2000b), Impact of the Research Assessment Exercise and the Future of Quality Assurance in

128

Learning from science and technology policy evaluation

the Light of Changes in the Research Landscape. Final Report Prepared for HEFCE, Bristol: HEFCE. HEFCE – Higher Education Funding Council for England (2000c), Review of Research, Bristol: HEFCE. HEFCs – Higher Education Funding Councils (1995a), 1996 Research Assessment Exercise. Guidance on Submissions, Bristol: HEFCE. HEFCs – Higher Education Funding Councils (1995b), 1996 Research Assessment Exercise. Criteria for Assessment, Bristol: HEFCE. HEFCs – Higher Education Funding Councils (1996), 1996 Research Assessment Exercise. The Oucome, Bristol: HEFCE. HEFCs – Higher Education Funding Councils (1999), Research Assessment Exercise in 2001. Guidance on Submissions, Bristol: HEFCE. HEFCs – Higher Education Funding Councils (2001), A Guide to the 2001 Research Assessment Exercise, Bristol: HEFCE. Hicks, Diana and Sylvan Katz (1996), Systemic Bibliometric Indicators for the Knowledge-Based Economy. Paper Presented at the OECD Workshop ‘New Indicators for the Knowledge-Based Economy’ (19−21 June 1996), Paris: OECD. Hornbostel, Stefan (1997), Wissenschaftsindikatoren. Bewertungen in der Wissenschaft, Opladen: Westdeutscher Verlag. IMD – International Institute for Management Development (1996), The World Competitiveness Yearbook 1996, Lausanne: IMD. Krull, Wilhelm (1994), ‘Im Osten wie im Westen − nichts Neues? Zu den Empfehlungen des Wissenschaftsrates für die Neuordnung der Hochschulen auf dem Gebiet der ehemaligen DDR’, in Renate Mayntz (ed.), Aufbruch und Reform von oben. Ostdeutsche Universitäten im Transformationsprozess, Frankfurt am Main: Campus Verlag, pp. 205−25. Krull, Wilhelm (ed.) (1999), Forschungsförderung in Deutschland. Bericht der internationalen Kommission zur Systemevaluation der Deutschen Forschungsgemeinschaft und der Max-Planck-Gesellschaft, Hannover, mimeo. Kuhlmann, Stefan (1998), Politikmoderation. Evaluationsverfahren in der Forschungs- und Technologiepolitik, Baden-Baden: Nomos Verlag. Liuhanen, Anna-Maija (2001), ‘Finland: Institutional Evaluation of Finnish Universities’, in Kauko Hämäläinen, Statu Pehu-Voima and Staffan Wahlén (eds), Institutional Evaluations in Europe. ENQA Workshop Reports 1, Helsinki: European Network for Quality Aussurance in Higher Education, pp. 12−17. Lundvall, Bengt-Åke (ed) (1992), National Systems of Innovation. Towards a Theory of Innovation and Interactive Learning, London: Pinter Publishers. Ministry of Education (1993a), Evaluation of the University of Jyväskylä. Report of External Visiting Group, Helsinki: MoE.

The evaluation of university research

129

Ministry of Education (1993b), Evaluation of the University of Oulu. Report of External Visiting Group, Helsinki: MoE. Ministry of Education (2001), Management by Results in Higher Education, Helsinki: MoE. Müller, Karl H. (1999), Marktentfaltung und Wissensintegration. DoppelBewegungen in der Moderne, Frankfurt am Main: Campus Verlag. Müller-Böling, Detlef (1995), ‘Qualitätssicherung in Hochschulen. Grundlage einer wissensbasierten Gesellschaft’, in Detlef Müller-Böling (ed.), Qualitätssicherung in Hochschulen. Forschung, Lehre, Management, Gütersloh: Verlag Bertelsmann Stiftung, pp. 27−45. Nelson, Richard R. (ed.) (1993), National Innovation Systems. A Comparative Analysis, Oxford: Oxford University Press. OECD − Organization for Economic Co-operation and Development (1989), The Measurement of Scientific and Technical Activities. ‘Frascati Manual’ Supplement, Paris: OECD. OECD − Organization for Economic Co-operation and Development (1994), The Measurement of Scientific and Technological Activities. Proposed Standard Practice for Surveys of Research and Experimental Development. Frascati Manual 1993, Paris: OECD. OECD − Organization for Economic Co-operation and Development (1998a), Education at a Glance. OECD indicators 1998, Paris: OECD. OECD − Organization for Economic Co-operation and Development (1998b), Science, Technology and Industry Outlook 1998, Paris: OECD. OECD − Organization for Economic Co-operation and Development (1999a), OECD Historical Statistics 1960-1997, Paris: OECD. OECD − Organization for Economic Co-operation and Development (2000a), Main Science and Technology Indicators, Paris: OECD. OECD − Organization for Economic Co-operation and Development (2000b), Basic Science and Technology Statistics, Paris: OECD. ÖRK – Österreichische Rektorenkonferenz (1998a), Evaluierung, Vienna: ÖRK. ÖRK – Österreichische Rektorenkonferenz (1998b), Universitätspolitische Leitlinien, Vienna: ÖRK. Porter, Michael E. (1990), The Competitive Advantage of Nations, New York: Free Press. Raan, Anthony F.J. van (1995), ‘Bewertung von Forschungsleistungen: Fortgeschrittene bibliometrische Methoden als quantitativer Kern von Peer-review-basierten Evaluationen’, in Detlef Müller-Böling (ed.), Qualitätssicherung in Hochschulen. Forschung, Lehre, Management, Gütersloh: Verlag Bertelsmann Stiftung, pp. 85−102. Rip, Arie and Barend J.R. van der Meulen (1995), ‘The Patchwork of the Dutch Evaluation System’, Research Evaluation, 5 (1), 45−53.

130

Learning from science and technology policy evaluation

Röbbecke, Martina and Simon, Dagmar (2001), ‘The Assessment of Leibniz Institutes: The Relationship between External and Internal Evaluation’, in Philip Shapira and Stefan Kuhlmann (eds), Learning from Science and Technology Policy Evaluation: Proceedings from the 2000 U.S.−European Workshop, Atlanta: School of Public Policy (Georgia Institute of Technology), and Karlsruhe: Fraunhofer Institute for Systems and Innovation Research, pp. 290−97. Schimank, Uwe (1995), Hochschulforschung im Schatten der Lehre, Frankfurt am Main: Campus Verlag. SWR – Schweizerischer Wissenschaftsrat (Peter Weingart, Jörg Strate and Matthias Winterhager) (1992), Forschungslandkarte Schweiz 1990, Bern: Forschungspolitik (FOP) 11 (1992). SWR – Schweizerischer Wissenschaftsrat (1993), Revitalising Swiss Social Science. Evaluation Report, Bern: Research Policy (FOP) 13 (1993). SWR – Schweizerischer Wissenschaftsrat (1995), Evaluation of Physics Research in Switzerland. Schlussbericht, Bern: Research Policy (FOP) 25 (1995). SWR – Schweizerischer Wissenschaftsrat (1998), Evaluation der geisteswissenschaftlichen Forschung in der Schweiz. Ergebnisse und Empfehlungen des Schweizerischen Wissenschaftsrat. Kurzfassung, Bern: Forschungspolitik (FOP) 53 (1998). SWR – Schweizerischer Wissenschaftsrat (1999), Forschungslandkarte Schweiz 1997. Bibliometrische Indikatoren der schweizerischen Forschung in den Jahren 1993-1997, Bern: Fakten & Bewertungen (F&B) 3 (1999). VSNU – Vereniging van Samenwerkende Nederlandse Universiteiten (1994), Protocol 1994, Utrecht. VSNU – Vereniging van Samenwerkende Nederlandse Universiteiten (1998), Protocol 1998, Utrecht. VSNU – Vereniging van Samenwerkende Nederlandse Universiteiten (2000a), Philosophy, Utrecht. VSNU – Vereniging van Samenwerkende Nederlandse Universiteiten (2000b), Mechanical Engineering, Utrecht. VSNU – Vereniging van Samenwerkende Nederlandse Universiteiten (2000c), Environmental Sciences, Utrecht. VSNU – Vereniging van Samenwerkende Nederlandse Universiteiten (2000d), Marine Technology. Quality Assessment of Education and Research, Utrecht. Westerheijden, Don F. (1997), ‘A Solid Base for Decision. Use of the VSNU Research Evaluations in Dutch Universities’, Higher Education, 33 (4), 397−413.

The evaluation of university research

131

Wex, Peter (1995), ‘Die Mittelvergabe nach Leistungs- und Belastungskriterien. Ein Beitrag zum Leistungswettbewerb in der Hochschule’, Wissenschaftsmanagement, 1 (4), 168−74. Wissenschaftliche Kommission Niedersachsen (2000), Forschungsevaluation an niedersächsischen Hochschulen und Forschungseinrichtungen. Grundzüge des Verfahrens, Hannover. Ziegele, Frank (2000), ‘Mehrjährige Ziel- und Leistungsvereinbarung sowie indikatorgesteuerte Budgetierung’, in Stefan Titscher, Georg Winckler and Hubert Biedermann (eds), Universitäten im Wettbewerb. Zur Neustrukturierung österreichischer Universitäten, Munich: Rainer Hampp Verlag, pp. 331−86.