Online use and information seeking behaviour ... - Semantic Scholar

17 downloads 3160 Views 251KB Size Report
The findings revealed significant subject and institutional differences. Life scientists were ... Email: [email protected] at Univ of .... sive and detailed understanding of the virtual scholar by linking together an Elsevier-produced ..... viewing of articles in HTML format by life scientists; (b) the relatively strong showing of history.
Online use and information seeking behaviour: institutional and subject comparisons of UK researchers

David Nicholas, David Clark and Ian Rowlands CIBER: Centre for Information Behaviour and the Evaluation of Research, University College London, London

Hamid R. Jamali Department of Library and Information Studies, Tarbiat Moallem University, Tehran, Iran and CIBER: Centre for Information Behaviour and the Evaluation of Research, University College London, London

Abstract. The paper reports on the results of the project ‘Evaluating the usage and impact of e-journals in the UK’. Using deep log analysis techniques, we evaluated the use of the Oxford Journals database in regard to life sciences, economics and history by 10 major UK research institutions. The aim of the study was to investigate researchers’ digital behaviour, and to ascertain whether it varied by subjects and disciplines, or in relation to the institutions. The findings revealed significant subject and institutional differences. Life scientists were the biggest users. Economists made the greatest use of abstracts. Historians proved to be the most active searchers. Research intensive universities were characterized by high volume use and short session times, light sessions, and sessions which utilized few of the search functions available. Open access journals featured strongly in the ranked lists of life sciences and history; and Google was an extremely popular means of accessing journal content, especially so in the case of historians.

Keywords: e-journals; information seeking behaviour; institutional comparisons; log analysis; subject comparisons

Correspondence to: Hamid R. Jamali, Department of Library and Information Studies, Faculty of Psychology and Education, Tarbiat Moallem University, No. 49, Mofateh Ave, PO Box 15614, Tehran, Iran. Email: [email protected]

Journal of Information Science, 35 (6) 2009, pp. 660–676 © The Author(s), 2009. Reprints and permissions: http://www.sagepub.co.uk/journalsPermissions.nav, DOI: 10.1177/0165551509338341 Downloaded from jis.sagepub.com at Univ of Northumbria on January 4, 2015

660

D. Nicholas et al.

1.  Introduction E-journals have proved an undoubted success and study after study have demonstrated very high levels of usage, information seeking behaviour and, to a more limited extent, satisfaction. However, it might be expected that there would be differences, possibly large ones, between subjects and individual institutions and this paper sets out to ascertain this in some detail, employing deep log methods for this purpose. It is believed that this is the first time UK researchers have been the subject of such an evaluation. The paper reports on some of the results of the Research Information Network (RIN) funded project ‘Evaluating the usage and impact of e-journals in the UK’1 which set out to investigate the impact of scholarly e-journals on the UK research community. The RIN study focused on two journal databases, ScienceDirect and Oxford Journals, and this paper covers the investigation of the latter. For an analysis of the former see [1].

2.  Aims, objectives and scope The aims of the paper are to: • investigate scholars’ digital behaviour, in terms of levels and patterns of usage, type of content viewed, navigational preferences, and routes used to access e-journal content; • ascertain whether scholars’ use and behaviours varied by subjects and disciplines, or in relation to the institutions in which they worked. To this end the usage logs of the Oxford Journals (Figure 1) were evaluated in regard to use by 10 major UK research institutions (Table 1) and three representative subjects – one science (life sciences), one social science (economics) and one humanities (history). Oxford Journals (OJ) was chosen because it covered these three subjects as well as having a top-quality list of research journals (the median impact factor for Oxford journals was 2.8 in 2005, a figure surpassed only by the journals published by the next highest impact publisher, Nature Publishing Group). In total, 61 journals and over half a million pages were evaluated from January to December 2007. Case study institutions were selected on the following grounds: (a) in the case of the universities, they had obtained a 4 or more in the Research Assessment Exercise (RAE) 2001 in one or more case study subject; since the focus was on research-active academics; (b) for balance and comparison, to produce a mix of large and small universities; (c) given the general neglect of Government laboratories in the literature it was important they be represented. As mentioned above the focus of the RIN study was largely on researcher use, although it was not possible to filter out all teaching and student learning use from the logs. However, given the high research ranking of the departments and journals involved (laboratories, of course, had no students) and the fact that it has been found by others [2] that students were minority users of the journal literature we do not think this is a problem. Indeed, we estimated that student use constituted less than 15% of the use reported in this paper, and that much of this constituted research in connection with dissertation projects and higher degree research.

3.  Literature review Because log studies furnish such different data to survey studies and because of the sheer number of the latter, we have chosen to concentrate mainly on log usage studies. The MaxData study [3] was perhaps closest to this study in nature and coverage, although it did cover US universities. That study focused on usage at four US universities, both research and teaching oriented centres. The quantitative stream of the study consisted of a deep log analysis of journal usage over 15 months on the OhioLINK platform. The most interesting finding was the differences found between information seeking and use in teaching and research universities, which were largely a function of research Journal of Information Science, 35 (6) 2009, pp. 660–676 © CILIP, DOI: 10.1177/0165551509338341 Downloaded from jis.sagepub.com at Univ of Northumbria on January 4, 2015

661

D. Nicholas et al.

Fig. 1. Oxford Journals homepage (www.oxfordjournals.org)

activity and the size of the academic community. It was found that the most research-active universities recorded: (a) the shortest sessions; (b) the most focused ones – relatively low use of abstracts, fewer journals viewed in a session and search (engine) pages; (c) views to the most current journals; and (d) the most browsing sessions. Two of the RIN case study fields (economics and life sciences) were the subject of analysis in the Max Data Study and in respect to these fields some of the main findings were: business and economics (28%) recorded the highest proportion of bouncers – visitors who only used a single page; life sciences recorded the longest article and abstract view times, respectively, 77 and 73 seconds, and business and economics (49 seconds) some of the shortest times. The Authors as Users project [4] is also of great relevance here because it examined the logs of ScienceDirect, concentrated on researchers, was an international study with a high representation of UK users and undertook subject comparisons. The aim of the project was to obtain a comprehensive and detailed understanding of the virtual scholar by linking together an Elsevier-produced author survey about attitudes towards scholarly publishing activities with use of ScienceDirect. Of the authors who filled in the questionnaire, 750 were matched with accuracy to the logs, which were collected for an 18-month period. Again there proved to be very real differences in behaviour between authors in respect to their subject field. In regard to the two RIN case study subjects featured the following was found: • life sciences recorded: (a) the highest number of article views; (b) the lowest views to the journal homepages; (c) the highest views to both PDF and full-text articles in the same session; (d) the lowest views to articles in print; (e) the highest rate of views to regular articles; (f) the highest views to declining articles; (g) the highest rate of sessions recording one page views; (h) the highest proportion of sessions recording over 20 views; and (i) the highest proportion of sessions recording an abandoned search. • economics and econometrics recorded: (a) the lowest views to article list and journal list pages; (b) the highest views to journal issue pages; (c) the highest abstract views; (d) the highest rate of articles in print viewed (based on journal subjects); (e) the lowest rate of articles in print viewed Journal of Information Science, 35 (6) 2009, pp. 660–676 © CILIP, DOI: 10.1177/0165551509338341 Downloaded from jis.sagepub.com at Univ of Northumbria on January 4, 2015

662

D. Nicholas et al.

Table 1 Case study institutions Centre for Ecology and Hydrology (CEH) Rothamsted Research (Agricultural Research Centre) University College London University of Aberdeen University of Bangor University of Cambridge University of Edinburgh University of Manchester University of Strathclyde University of Swansea

(based on user subject); (e) the highest views to current articles; (f) the lowest views to old articles; (g) the highest number of sessions with over 20 views; (h) the lowest number of sessions with views to over 20 unique journals; (i) the highest number of sessions with one search conducted and the lowest number of sessions with five and over searches; (j) the highest number of searches with zero hits returned; and (k) the lowest average number of articles viewed. Surprisingly, there have been no other significant log studies published on the use of e-journals in the UK. In the data vacuum it is worth mentioning some of the key findings of user surveys covering e-journals. Finholt and Brooks [5] surveyed economics and history faculties at the University of Michigan and found that historians viewed abstracts of e-journal articles less than economists. Smith [6] found that science faculty members generally made more use of e-journals than those from the social science faculty. Studies have also pointed to a general tendency among online journal users to search rather than browse [7, 8]. A survey of users of the Finland Electronic Library (FinELib) also confirmed that in the electronic information environment, browsing journals is generally replaced by subject searching in databases. However, the importance of various search methods varied significantly between the disciplinary groups. Keyword searching in journal databases had a significantly more important role in the natural sciences and medicine compared to other disciplines. In the former disciplinary group about three-quarters rated journal searching as important whereas in the latter group a little over half considered it important. Although browsing was rated as a more important search method in social sciences, economics and medicine than in other disciplines, the differences between individual disciplines were not statistically significant. Chaining was a significantly more important search method in economics and engineering compared to humanities and medicine. In humanities this method was surprisingly unimportant compared for instance with social sciences [9]. Another survey of FinELib revealed some disciplinary differences. The study showed that while the proportion of those using mainly electronic material had grown in the humanities and social sciences by, respectively, only 7 and 17% during 2000–2005, the growth in other disciplines varied between 38 and 53%. While the difference between the humanities and social sciences and other disciplines in 2000 was 10–20% in terms of e-users, by 2005 it had reached 45–60%. Humanists were less frequent users, social scientists formed a middle group and the representatives of all other disciplines were the most frequent users. The study did not present details on the nature of use or on the information seeking behaviour of users [10].

4.  Methodology This study results from the analysis of OJ server logs for the full year 2007. A total of 211 journal titles and 505 million records were represented in the logs. Each record represents a request for a web page or PDF document. Journal of Information Science, 35 (6) 2009, pp. 660–676 © CILIP, DOI: 10.1177/0165551509338341 Downloaded from jis.sagepub.com at Univ of Northumbria on January 4, 2015

663

D. Nicholas et al.

Log files support the maintenance and administration of the web server and content management mechanism. Thus, like the allocation of IP addresses and Domain Name Service, they are a technical mechanism co-opted for the purposes of market and behavioural analysis. There are certain limitations and difficulties in this approach that should be noted. Log files do not record the content that was sent by the server and (implicitly) received by the end user but rather the request the server received from a putative user. There is a problem of separating log data about real end users from noise; ancillary content, internal content management processes and, not always benign, webcrawler robots. And then there is the question: what did the user intentionally request and what did he or she obtain? In the past the mechanics were relatively simple: a web address typed into a browser or followed from a link mapped to a file on the server, the server delivered that file, and the browser displayed it. That model does not describe how most of today’s web content is delivered. The web page address (the URI) does not map directly to a file but is a call to a program. The server’s program, part of a content management system, retrieves content from a database, probably in multiple locations, and composes a ‘page’ to meet that request. That page probably contains a program code; what the end user sees is not determined until that code is run on the browser. But the log still records the request received, not what the server supplied. Nor can we be sure how it appeared to the user. (We can infer that something did appear, otherwise the log would note ‘error 404’ or similar if the request could not be satisfied.) Log analysis is thus, like any data mining, the task of building a classifier; of discovering a correspondence between the request recorded in the log and what the user sought. It can also be seen as a process of reverse engineering: inferring the workings of the content management system from a combination of log records and the observed behaviour of the website. The log describes – somewhat cryptically – what the user asked; we need to analyse the answer they were given. 2007–05–02 05:35:59.253 content 170.x.x.108 /cgi/content/abstract/18/1/47 http://scholar.google.com/scholar?hl=en&lr=&q=Muscle+weakness+and+drugs+in+elderlyRjiFrqtCeaQAA BroIhc;m9ueiwhgs1.JS1 “Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)”

Above is an example log record: we have date (2007–05–02), time (05:35:59.253), internet address (x.x.x.108), and the document requested (abstract/18/1/47). In this case we also know that the link was from Google as a result of a search for ‘muscle weakness and drugs in elderly’. The string ending ‘.JS1’ is a session cookie that may enable the identification of records relating to the same user. There is a separate series of log files for each journal and this record is for Age and Ageing. The format of the document URI can be decoded as a request for an abstract of volume 18, issue 1, and an article on page 47. By reference to other data available from OJ, we can identify this as [C. WICKHAM, C. COOPER, B. M. MARGETTS, and D. J. P. BARKER; Muscle Strength, Activity, Housing and the Risk of Falls in Elderly People; Age Ageing 1989 18: 47–51; doi:10.1093/ ageing/18.1.47]. We can thus determine the age of the article. From the internet address we can identify the organization providing the internet connection and, implicitly, the location of the requesting browser. The are two possible paths to this identification; a domain name such as www.ucl.ac.uk is an index pointing to an internet address, and a reverse DNS lookup can thus translate 128.40.47.21 to www.publishing.ucl.ac.uk. However this method has several limitations. The system was designed as a mnemonic means of identifying a server, a computer that accepts incoming requests from a browser. The reverse lookup back to the browser is not essential and is not always possible. In addition the process is very slow; to identify all the 12 million internet addresses found in the OJ logs takes over eight hours. The alternative method is to identify blocks of internet addresses allocated to institutions by Internet Assigned Numbers Authority (IANA.org) and Réseaux IP Européens (RIPE.net). This method is particularly well-suited to the present study. It is fast: the same 12 million IP addresses can be processed in less than two minutes. It is accurate and complete: all the institutions selected for the study have been allocated exclusive blocks of IP addresses but only 75% of these return an rDNS response. And these are the basis of OJ's own subscription access control. Journal of Information Science, 35 (6) 2009, pp. 660–676 © CILIP, DOI: 10.1177/0165551509338341 Downloaded from jis.sagepub.com at Univ of Northumbria on January 4, 2015

664

D. Nicholas et al.

The OJ 2007 log contained 505,181,345 records. After elimination of non-journal logs and badlyformed records this was reduced to 496,684,204. The present study focused in detail on 61 journals in three subject categories, a total of 180,060,545 logged requests. From these we discarded records associated with content management and access control (3.7%), ancillary and duplicate responses to a single request (19.3%) and invalid or unfulfilled requests (404 errors, 4.3%). A total of 124,658,608 valid records (69.2%) were left. Generally raw logs contain a significant volume of requests originating from web-crawlers and robots, not all benign. As the study was restricted to 10 institutions no special measures were considered necessary to filter out these data; it was assumed that IP networks allocated to these institutions were not a significant source of such activity. Except for the ‘LOCKSS cache’ LOCKSS (www.lockss.org) is described as a ‘digital preservation appliance’; it maintains a copy of digital content that can preserve access if the publisher’s digital copy becomes unavailable. What is not entirely clear from the descriptions of how it works is how we should interpret its presence in the log entries. On the one hand: ‘It acts as a web proxy or cache, providing browsers in the library’s community with access to the publisher’s content or the preserved content’. This would suggest that requests from an institution by the ‘LOCKSS cache’ user agent might represent real users at the institution and should therefore be included in the analysis. We might also need to consider that such a proxy might bias the sample by under-reporting the use of the most popular content. On the other hand: ‘When a request for a page from a preserved journal arrives, it is first forwarded to the publisher. If the publisher returns content, that is what the browser displays. Otherwise the browser displays the preserved copy’. On this model the ‘LOCKSS cache’ can be considered autonomous additional traffic, communicating with other LOCKSS boxes to maintain a mirror that is only used if the publisher’s site is unavailable. The distinction is important because the volume of LOCKSS traffic evident in the logs is very high: more than 12% of all log entries which would otherwise count as valid page views are from the ‘LOCKSS cache’ user agent. (Only abstracts, full-text HTML and PDFs appear affected, and in this context the proportion rises to more than 50%.) The four largest institutions in this study, University College London (UCL), Manchester (MAN), Cambridge (CAM) and Edinburgh (ED), all had a LOCKSS box and the percentage of usage varied over time. The pattern suggests that it may be best to view the LOCKSS box as a mirror of last resort rather than a proxy. When started, the box generates a lot of traffic as it builds a content mirror but none of this is directly related to user activity at the same institution. The data were analysed with and without the page requests originated from a LOCKSS box. The data presented here exclude LOCKSS. It may thus under-report activity at the largest institutions for the most popular content but has the benefit of being with greater certainty a study of the activity of real users. With the exclusion of items with a ‘LOCKSS cache’ user-agent identifier, it is assumed that all requests identified by an IP address managed by the institution represent a real user. This reduced the valid records for users of the selected journals from the case study institutions to a total of 642,653 page requests. Some 6.2% of these valid records were unclassified: minor pages within each journal’s website and other content that was difficult to classify. The important working definitions adopted for the analysis are as follows: • Sessions. A series of page requests that can be identified on the basis of time, IP address and cookies as probably originating from the same user in a continuous linked sequence. Each journal has a separate log, so a session covers only one journal. Most sessions involve a request for a download, possibly via a few intermediate pages. • Page viewed. A ‘complete’ item returned by the server to the client in response to a user action. Typically this might be an abstract, an article or a table of contents. A complete item might be all the pages, charts, etc. from an article and this is recorded as a single item. It should be noted that the subject classification used here is the subject category used by Oxford University Press (OUP) to categorize their journals. In other words, we define subject disciplines as the subject of the journals, not the subject of the reader, so we find out about readings rather than readers. Therefore, interdisciplinary readers are treated as if they are the same as disciplinary readers. This is due to a limitation of the log data.

Journal of Information Science, 35 (6) 2009, pp. 660–676 © CILIP, DOI: 10.1177/0165551509338341 Downloaded from jis.sagepub.com at Univ of Northumbria on January 4, 2015

665

D. Nicholas et al.

Table 2 Subject usage metrics Total page views

Total full-text views

Total HTML views

Total PDF views

Sessions Average in seconds

Subject

n

%

n

%

n

%

n

%

n

%

Life sciences (51% of journals) Economics (31% of journals) History (18% of journals) All case study subjects (61 journals)

509,457

  80.7

240,037

  84.3

101,094

  94.9

138,943

  78.0

192,724

  82.6 262

  66,827

  10.6

  25,099

   8.8

2567

   2.4

  22,532

  12.6

  24,490

  11.5 261

  54,857

   8.7

  19,598

   6.9

2921

   2.7

  16,677

   9.4

  16,154

   6.9 262

631,141

100.0

284,734

100.

100

178,152

100

233,368

100

106,582

5.  Results 5.1. Subject comparisons Three subjects featured in the analysis, life sciences, economics and history, which provided an insightful comparison between science, social science and humanities subjects. For the purposes of the analysis, case study subjects were defined by the OJ subject category of the journal used, rather than by department name. The reasons for this were: (a) it was not possible to identify the subjects of the departments accurately from the logs, or from the sub-network labels, which offered the best means of doing this and; (b) it would allow for the existence of documentary scatter, whereby a good proportion of a department’s publications appear in journals outside the subject of the researcher’s home department because of widespread collaborative and problem-driven research; (c) the subject scatter of usage would be allowed for – this scatter arises from the blurring of disciplines; (d) partly for the reason stated above, that the nature of research is changing, partly as a result of the primacy of multidisciplinary information platforms like OJ. 5.1.1. Usage Over half a million pages were viewed over the 12 months (Table 2). Life sciences proved to be a giant in terms of usage, accounting for over 80% of all the page views recorded by the three case studies. Admittedly, life sciences were pitted against subjects that had relatively small journal populations. Thus history had 11, economics 19 and life science 31 journals. However, even so, it more than punched its weight as it possessed only 50% of the journals. History accounted for less than 10% of use yet submitted about 40% the number of researchers as life sciences in RAE 2007 (Table 3); presumably historians’ preference for the scholarly monograph explains this. Life sciences usage would have been boosted considerably by the presence of the two government laboratories, whose 250 odd scientists were largely biological scientists. Other highlights of the Table were: (a) the much higher viewing of articles in HTML format by life scientists; (b) the relatively strong showing of history compared with economics, which had about three-quarters more journals, but a smaller body of researchers as defined by RAE 2007 (Table 3); (c) average session times were quite uniform. In regard to monthly patterns of use (Table 4) in the cases of economics and history, November proved to be the month in which most use took place (nearly 15% of annual use). For economics and history, respectively, September and August (summer vacation months) were the quietest months when less than 5% of use occurred. For life sciences use was generally more even throughout the year, with October being the peak month (nearly 11% of use) and December the trough month (around 6% of use). If we turn to day of the week patterns (Table 5) what are most noticeable

Journal of Information Science, 35 (6) 2009, pp. 660–676 © CILIP, DOI: 10.1177/0165551509338341 Downloaded from jis.sagepub.com at Univ of Northumbria on January 4, 2015

666

D. Nicholas et al.

Table 3 Number of staff submitted in the RAE 2007

Economics

Aberdeen Bangor Cambridge CEH Edinburgh Manchester Rothamsted Swansea Strathclyde UCL Total

Life sciences (including agricultural and veterinary)

History

Number

Rank

Number

Rank

Number

Rank

  14 n/a   38 n/a n/a   35 n/a   18 n/a   32 119

5 n/a 1 n/a n/a 2 n/a 4 n/a 3

  26   17 102 n/a n/a   34 n/a   18   15   49 228

4 6 1 n/a n/a 3 n/a 5 7 2

  67   52 253 n/a n/a 107 n/a   16 n/a   79 558

4 5 1 n/a n/a 2 n/a 6 n/a 3

Table 4 Use over time (monthly percentage of page views) Jan

Feb

Mar

Apr

May

Jun

Jul

Aug

Sep

Oct

Nov

Dec

8.8

  8.9 10.7

10.0   9.9

7.8 8.7

  8.9 10.2

7.0 4.7

7.2 4.3

7.3 5.2

6.8 4.2

10.6 11.4

10.4 14.7

6.3 7.2

8.9

Life sciences Economics

% %

History

%

9.5

  9.9

  9.7

7.1

  8.8

5.9

4.2

3.4

4.7

13.9

14.9

7.9

All case study journals

%

9.0

  9.2

  9.9

7.8

  9.0

6.7

6.6

6.7

6.3

10.9

11.2

6.6

perhaps are the strong variations in weekend usage, accounting for 17.5% in the case of economists, but only 8.4% in the case of life scientists. This means the former were effectively finding another working day by working weekends. The subject difference could be connected to the fact that it is much easier for economists to work from home (they do not need laboratories) but this does not fully explain why they do so more than historians (13.3% of use on a weekend), although the fact that historians are more book (hence library) oriented might have something to do with it. (The big factor is that the government laboratories only work Monday to Friday 9–5.) Table 6 shows what time of day researchers searched the database and this also shows that researchers, liberated from the physical library, conduct much research work out of office hours, and probably at home. Thus economists conducted approaching a third (29.3) of their searching between the hours of 6 p.m. and 6 a.m. The equivalent figure for life sciences was 16.5%, a substantial difference. 5.1.2. Information seeking behaviour (session data) Logs provide a comprehensive and detailed view of how researchers access a site and, once there, how they navigate around it. In this regard they provide us with three types of information: (a) where they were and what they might have been doing prior to arriving at OJ, something which enhances our knowledge of the broader web session of which the OJ search might have been a part; (b) where they actually arrived in the site and what kind of view of the OJ content they obtained as a consequence, Journal of Information Science, 35 (6) 2009, pp. 660–676 © CILIP, DOI: 10.1177/0165551509338341 Downloaded from jis.sagepub.com at Univ of Northumbria on January 4, 2015

667

D. Nicholas et al.

Table 5 Use over time (average day of the week page views)   Life sciences Economics History All case study subjects

 

Mon

Tue

Wed

Thu

Fri

Sat

Sun

n

96,422

99,558

96,044

94,379

80,269

20,561

22,224

% n % n % n %

18.9 11,776 17.6 9964 18.2 118,162 18.7

19.5 11,909 17.8 10,323 18.8 121,790 19.3

18.9 11,066 16.6 9238 16.8 116,348 18.4

18.5 10,754 16.1 10,052 18.3 115,185 18.3

15.8 9620 14.4 7951 14.5 97,840 15.5

4.0 5861 8.8 3536 6.4 29,958 4.7

4.4 5841 8.7 3793 6.9 31,858 5.0

6–9 a.m.

9–12 a.m.

12–3 p.m.

3–6 p.m.

6–9 p.m.

9–12 p.m.

Table 6 Use over time (average hourly page views) 12–6 a.m. Life sciences Economics History All case study subjects

n

12,706

9471

112,670

150,338

152,975

50,254

21,043

%

2.5

1.9

22.1

29.5

30.0

9.9

4.1

n

4379

1301

10,195

17,417

18,365

9394

5776

%

6.6

1.9

15.3

26.1

27.5

14.1

8.6

n

1760

820

9949

17,185

15,968

5719

3456

%

3.2

1.5

18.1

31.3

29.1

10.4

6.3

n

18,845

11,592

132,814

184,940

187,308

65,367

30,275

%

4.5

2.8

31.8

44.3

44.8

15.6

7.2

something which might influence what they use; and (c) on searching and navigating styles (browsing vs searching). 5.1.3. Navigating towards content Oxford Journals opened their content to Google in 2004 and the popularity of Google is there for all to see (Table 7). Around 40% of sessions arose from a Google search. Clearly Google is an important tool for researchers and, perhaps surprisingly, most popular with historians, who might have been regarded to be more traditionalists – 45% of their sessions arose from a Google search. Google Scholar proved relatively popular too, and especially so in the case of economics, where nearly a quarter (22%) of sessions originated from it. Advanced searching proved a little more popular than has been found elsewhere [1, 9] but not a lot. Thus, for instance, its biggest proponent, history, saw just 2% of sessions using it. Basic (internal) search engine searching proved a little more popular, but nowhere near as popular as web engine searching. Historians proved to be the biggest users of the facility, with 10% of sessions seeing it employed. Menu (tables of contents, lists of journals and subjects) searching or browsing provided the most popular way of finding content once in OJ. It proved most popular in the case of historians, 43% of their sessions featured a view to a menu, which contrasts with the life science figure of 16%, which might be explained by the fact that being big users of gateway sites, like PubMed, much of Life Science browsing (and searching) is conducted off-site. 5.1.4. Content viewed Despite the fact that a good proportion of history journals have abstracts, very few historians viewed an abstract during their visit (Table 8). Three times as many life science and economics sessions Journal of Information Science, 35 (6) 2009, pp. 660–676 © CILIP, DOI: 10.1177/0165551509338341 Downloaded from jis.sagepub.com at Univ of Northumbria on January 4, 2015

668

D. Nicholas et al.

Table 7 Method of access and navigation (sessions). Percentage of all sessions for that subject Access Subject n

Life sciences

% Economics

n

History

n

% % All case study subjects

n %

Search

Browse

Google access

Google Scholar

Advanced search

Basic search

menus

74,230

3291

3904

13,118

39

2

1

3

16

8980

5332

757

3115

22,341

79,392

37

22

1

5

33

7325

1313

1170

5293

23,755

45

8

2

10

43

90,535

9936

5831

21,526

125,488

39

4

1

3

20

Table 8 Content viewed (sessions) Volume

Form

Age/currency

Subject

Average no. of pages viewed

Average no. of articles viewed

Percentage viewing an abstract

Median age of article viewed (months)

Life sciences Economics History

2.7 2.7 2.8

1.1 1.1 1.1

16.0 18.3 5.2

48 73 90

viewed an abstract. Maybe an abstract is less useful in arts subjects. In sciences it is probably fairly clear from the abstract how the results or conclusions of a paper fit with one’s own research programme. An in-depth argument requires repeating the experiment, etc. For a history paper there is more likely to be an in-depth argument within the text of a paper so historians would be more likely to read (or skim) the whole paper rather than an abstract. The greater use of abstracts by economists might well be explained by life scientists viewing their abstracts off-site on a gateway site. The average number of pages viewed in a session (a ‘busyness’ metric) was low when compared with what has been seen elsewhere [1], and uniform at around 2.7 pages a session. The average number of articles viewed was low (1.1) and also very uniform. Taken together with the short session times we have noted earlier it is clear that highly focused information seeking was being conducted. In terms of the age of the article viewed there were the expected differences along discipline lines, with life sciences viewing the most recent ones (median four years); for economics the median was six years and for history seven and a half years. For life sciences 25% of articles were no more than 16 months old, 25% were over 104 months old, and the oldest was 97 years. There was obviously more demand for older articles by historians and the history decay curve took longer to level off, but where back numbers are provided online (and OJ is very good in this respect) there was as much demand for them and this was the case in the life sciences too. Perhaps, because it is a younger discipline or maybe OUP have not been publishing economic journals as long, use ‘only’ went back 60 years. Nevertheless, we still see a demand for older articles in economics with 75% of articles being used aged 26 months or older, that is higher than for history where the lower quartile comes in at 23 months. Perhaps the real surprise was how relatively old the life science articles

Journal of Information Science, 35 (6) 2009, pp. 660–676 © CILIP, DOI: 10.1177/0165551509338341 Downloaded from jis.sagepub.com at Univ of Northumbria on January 4, 2015

669

D. Nicholas et al.

Table 9 Top 20 journals Rank

Life sciences

Economics

History

 1  2  3  4

Nucleic Acids Research Human Molecular Genetics Bioinformatics Cerebral Cortex

Oxford Review of Economic Policy Cambridge Journal of Economics Review of Finance Industrial and Corporate Change

 5

Molecular Biology and Evolution

Oxford Economic Papers

 6  7  8  9

Journal of Experimental Botany Carcinogenesis Journal of Biochemistry Annals of Botany

World Bank Economic Review World Bank Research Observer Journal of Economic Geography Journal of Financial Econometrics

10

International Immunology

11 12 13 14

Behavioral Ecology Integrative and Comparative Biology Journal of Petrology Plant and Cell Physiology

15 16

Protein Engineering, Design and Selection MHR: Basic Science of Reproductive Medicine Toxicological Sciences Forestry: An International Journal of Forest Research Glycobiology Journal of Plankton Research

European Review of Agricultural Economics Contributions to Political Economy Journal of Financial Econometrics Socio-Economic Review Journal of Competition Law and Economics CESifo Economic Studies

English Historical Review History Workshop Journal Social History of Medicine Twentieth Century British History Journal of the History of Medicine and Allied Sciences French History Journal of Semitic Studies Journal of Islamic Studies Holocaust and Genocide Studies Journal of the History of Collections Journal of Design History

17 18 19 20

American Law and Economics Review Review of Finance Review of Environmental Economics and Policy

being viewed were; after all we had found it to be more like two years in the case of ScienceDirect [1]. There are a number of possible explanations: (a) OJ has an extensive archive going back for some journals to the 1850s and there is plenty of evidence of a long tail of use; (b) the particular selection of journals skews use. There was little difference in article decay (fall in use) by month by subject. We would posit that there are two markets: a mass market that is mostly looking for the latest material and a smaller specialist market whose research may send users to all and any material available. The big institutions have a big mass market but once the curve levels off a lone researcher in the smallest of institutions is as likely as one of the largest to be looking at the long tail. If we look at individual titles we can see clearly that some titles have been around for longer than others and that the age-related demand curve does vary by title. 5.1.5. Journals used Table 9 lists the most used journals for each field. Nucleic Acids Research (Life Sciences), Oxford Review of Economics and English Historical Review topped their respective subject rankings. Nucleic Acids Research is a truly huge journal with numerous supplements and has an ISI journal impact factor of 6.954 (2008); the Oxford Review was interestingly not the top IF ranked in economics (0.552) – the Cambridge Journal of Economics was higher (0.7); and English Historical Review has no impact factor and the Social History of Medicine had the highest (0.809). However, what is

Journal of Information Science, 35 (6) 2009, pp. 660–676 © CILIP, DOI: 10.1177/0165551509338341 Downloaded from jis.sagepub.com at Univ of Northumbria on January 4, 2015

670

D. Nicholas et al.

Table 10 History usage metrics Total page views

Total full-text views

Total HTML views

Total PDF views

Session numbers

Session time

Institution

n

%

n

%

n

%

n

%

n

%

Average in seconds

Aberdeen Bangor Cambridge Edinburgh Manchester Strathclyde Swansea UCL CEH Rothamsted

  1685   3429 15505   6393 13927   1540   1907   5194      0     22

  3.1   6.3 28.3 11.7 25.4   2.8   3.5   9.5   0.0   0.0

  859   836 5573 2313 4793   546   800 2066     0     3

  4.4   4.3 28.4 11.8 24.5   2.8   4.1 10.5   0.0   0.0

  145   128 1001   366   548    75   128   303     0     0

  5.0   4.4 34.3 12.5 18.8   2.6   4.4 10.4   0.0   0.0

  714   708 4572 1947 4245   471   672 1763     0     3

  4.3   4.2 27.4 11.7 25.5   2.8   4.0 10.6   0.0   0.0

  779   656 4891 1841 3998   418   632 1749     0     6

  4.8   4.1 30.3 11.4 24.7   2.6   3.9 10.8  0  0

275 307 266 286 254 189 310 243    0 255

of most interest perhaps is the fact that of the top five life science journals all operated in an open access (OA) mode of one form or another, something that would clearly increase usage from among those case studies without subscriber access. For history three OA journals featured in the top five: Social History of Medicine, Twentieth Century British History and Journal of the History of Medicine and Allied Sciences. It was different for economics where only the Review of Finance was OA. 5.2. Institutional diversity History has been chosen to illustrate the diversity that can be found among institutions researching the same subject. It is necessary to prefix this analysis with a warning, as the institutions had different access arrangements to OJ which have had an impact on comparisons. Thus Bangor and the University of Manchester had access to all history titles. Aberdeen had access to selected titles across the various subject fields and Centre for Ecology and Hydrology (CEH) had none. The other institutions had regular subscriptions to specific titles which will be established in the second phase of the project. In addition OJ provide some of their content for free: (a) some titles made their recent archives freely available after a fixed period of time; (b) some titles were completely OA; (c) some were hybrid subscription/OA journals where some articles were freely available to all and some articles only to subscribers. Also in 2006 OUP concluded a national deal with the Joint Information Systems Committee (JISC) which allowed any higher education institution in the UK to register for access to the digitized archive (pre-1996 content). We believe most institutions signed-up for this and it was confirmed by the high use of older issues discussed earlier. Table 3 provides some background on the case studies regarding the relative size of the user community and its research status, which will help in understanding some of the data. 5.2.1. Use Cambridge submitted the most staff to the 2007 RAE in history, so unsurprisingly, proved to be the super user in history, accounting for around a third of usage according to most of the metrics (Table 10). Manchester was a clear second and it submitted the third most staff to RAE 2007. UCL was second so it is striking that it is so low down the rank order of use. As might be expected the government laboratories made virtually no use of history journals and their data have been excluded from most analyses. Swansea and Bangor conducted the longest sessions and the big users, like Cambridge, tended to conduct the shortest sessions. However, Strathclyde actually conducted the shortest sessions.

Journal of Information Science, 35 (6) 2009, pp. 660–676 © CILIP, DOI: 10.1177/0165551509338341 Downloaded from jis.sagepub.com at Univ of Northumbria on January 4, 2015

671

D. Nicholas et al.

Table 11 Use over time in history (monthly percentage of page views)

Aberdeen Bangor Cambridge Edinburgh Manchester Strathclyde Swansea UCL Rothamsted

% % % % % % % % %

Jan

Feb

Mar

Apr

May

Jun

Jul

Aug

Sep

Oct

Nov

Dec

7.0 11.3 9.1 11.6 8.9 7.9 15.9 7.3 0

7.1 7.5 10.3 10.6 10.8 14.4 7.3 10.4 9.1

7.1 13.4 9.7 7.8 11.5 11.8 8.8 8.0 0

15.1 5.1 7.4 4.7 8.6 5.0 3.8 6.6 0

9.1 4.1 10.8 5.0 11.2 7.0 12.9 7.9 4.5

4.4 3.0 7.3 4.0 5.1 2.7 6.3 6.2 0

4.9 4.4 4.9 3.9 2.6 3.6 0.9 5.5 81.8

3.0 2.7 3.3 5.4 3.3 3.8 1.9 4.3 0

4.9 3.0 4.7 6.6 5.0 3.4 5.3 2.6 0

17.1 15.9 12.1 17.4 10.3 17.1 18.5 17.6 0

12.9 20.2 13.8 15.8 15.1 16.7 11.8 15.5 0

7.5 9.4 6.6 7.0 7.7 6.5 6.6 8.2 4.5

Table 12 Use over time in history (average day of the week of sessions)

Aberdeen Bangor Cambridge Edinburgh Manchester Strathclyde Swansea UCL Rothamsted

% % % % % % % % %

Monday

Tuesday

Wednesday

Thursday

Friday

Saturday

Sunday

16.4 16.2 17.8 19.6 18.5 16.6 18.4 17.8 66.7

18.6 16.9 18.4 18.4 18.9 19.7 18.7 21.5 16.7

20.4 15.1 17.2 17.7 19.2 19.2 19.0 18.6 0.0

18.2 19.2 18.7 19.9 17.3 19.7 19.4 18.9 0.0

17.1 14.9 16.7 14.6 14.3 15.6 16.7 14.9 16.7

4.7 5.9 4.9 4.7 7.4 3.4 2.9 4.5 0.0

4.5 11.7 6.3 5.1 4.4 5.8 4.9 3.7 0.0

5.2.2. Use over time There were big differences in monthly use at the institutional level (Table 11). What stood out most were Swansea’s high usage in January (15.9) and Bangor’s high usage in November (20.2). Bangor conducted most searching on weekends – 17.6% was conducted then (Table 12). Levels generally were less than were found with ScienceDirect [1]. Thus taking Cambridge, the super user in the category, where 11.2% of history searching was undertaken over the weekend. Economists at the same institution conducted 14% of their searching then. 5.2.3. Navigation Cambridge was the biggest user of Google (employed in 54% of sessions), demonstrating that even top users and researchers appreciate its retrieval powers (Table 13). However, Manchester proved to be the biggest Google Scholar user; over 10% of its sessions arose from its use. At Bangor more that half (51%) of all sessions recorded a menu being viewed; by contrast for Aberdeen the figure was less than half that (24%). The super users, Cambridge and Manchester, tended to use advanced searching the least. There were large differences in the use of the basic search facility, with Bangor employing it most (15% of sessions) and UCL and Cambridge the least (7%). 5.2.4. Content viewed Historians at UCL proved to be the busiest users, viewing 3.5 pages and 1.7 articles in an average session (Table 14). This is interesting in light of overall low use at UCL. That compared to, respectively, 2.3 and 1.0 for Aberdeen. UCL historians were also the greatest users of abstracts (viewed in Journal of Information Science, 35 (6) 2009, pp. 660–676 © CILIP, DOI: 10.1177/0165551509338341 Downloaded from jis.sagepub.com at Univ of Northumbria on January 4, 2015

672

D. Nicholas et al.

Table 13 Method of access and navigation (sessions). Percentage of all sessions for that subject

Aberdeen Bangor Cambridge Edinburgh Manchester Strathclyde Swansea UCL CEH Rothamsted

n % n % n % n % n % n % n % n % n % n %

Google access

Google Scholar

Menu use

Advanced search

Basic search

402 52 189 29 2662 54 918 50 1310 33 180 43 272 43 874 50 0

71 9 18 3 269 5 163 9 433 11 41 10 53 8 156 9 0

411 24 1745 51 7337 47 2597 37 6011 43 660 43 615 32 1999 38 0

61 4 123 4 218 1 190 3 291 2 34 2 32 2 61 1 0

206 12 517 15 1013 7 844 12 1454 10 169 11 228 12 387 7 0

5 83

0 0

7 32

2 9

9 41

Table 14 Content viewed (sessions) Volume

Form

Age

Subject

Average no. of pages viewed

Average no. of articles viewed

Percentage viewing an abstract

Median age of article viewed (months)

Aberdeen Bangor Cambridge Edinburgh Manchester Strathclyde Swansea UCL

2.2 3.1 2.9 2.6 2.4 2.4 2.5 3.5

1.0 1.2 1.1 1.1 1.0 1.2 1.2 1.7

5.0 2.0 4.9 3.9 6.2 4.8 6.3 7.8

85 96 110 84 103 91 89 83

7.8% of sessions), whereas historians at Bangor viewed abstracts rarely, in just 2% of sessions. Cambridge viewed the oldest articles (110 months old) and UCL the most recent ones (83 months). 5.2.5. Journals used The smaller journal pool for history meant that most institutions viewed the same titles (Tables 15 and 16). The English Historical Review was clearly top ranked in usage terms by all institutions that made use of history journals, featuring in the top five journals for 10 institutions. The History Workshop Journal (9) and Social History of Medicine (8) were also popular.

Journal of Information Science, 35 (6) 2009, pp. 660–676 © CILIP, DOI: 10.1177/0165551509338341 Downloaded from jis.sagepub.com at Univ of Northumbria on January 4, 2015

673

D. Nicholas et al.

Table 15 History top 10 journals Rank

Aberdeen

Bangor

Cambridge

Edinburgh

 1  2

English Historical Review French History

English Historical Review History Workshop Journal

English Historical Review History Workshop Journal

 3

Social History of Medicine Twentieth Century British History

Journal of Islamic Studies

 5

Twentieth Century British History Journal of the History of Medicine and Allied Sciences Journal of Semitic Studies

English Historical Review Twentieth Century British History History Workshop Journal

Twentieth Century British History

 6

History Workshop Journal

Journal of the History of Medicine and Allied Sciences Journal of Semitic Studies

 7

Social History of Medicine Journal of Islamic Studies

 4

 8

Holocaust and Genocide Studies French History

Journal of the History of Medicine and Allied Sciences Social History of Medicine Journal of Semitic Studies

French History Journal of the History of Collections Journal of Islamic Studies

Social History of Medicine

Journal of the History of Medicine and Allied Sciences Journal of Semitic Studies French History

Holocaust and Genocide Studies Journal of Design History

Journal of Design History

11

Journal of the History of Collections

Journal of the History of Collections

Rank

Manchester

Strathclyde

UCL

Rothamsted

Swansea

 1

History Workshop Journal Social History of Medicine Twentieth Century British History

Social History of Medicine English Historical Review Twentieth Century British History

Social History of Medicine History Workshop Journal English Historical Review

English Historical Review Social History of Medicine History Workshop Journal

 4

English Historical Review

History Workshop Journal

 5

Journal of the History of Medicine and Allied Sciences French History

Journal of the History of Medicine and Allied Sciences Journal of Design History Holocaust and Genocide Studies

Journal of the History of Medicine and Allied Sciences French History

English Historical Review History Workshop Journal Journal of the History of Medicine and Allied Sciences Social History of Medicine

 9 10

 2  3

 6

Journal of Islamic Studies

 7

Journal of Semitic Studies

 8

Holocaust and Genocide Studies

French History

 9

Journal of Islamic Studies Journal of the History of Collections Journal of Design History

10

11

Holocaust and Genocide Studies Journal of Design History

Twentieth Century British History Holocaust and Genocide Studies

Holocaust and Genocide Studies Journal of the History of Collections Journal of Design History

Twentieth Century British History Holocaust and Genocide Studies French History Journal of the History of Medicine and Allied Sciences Journal of Semitic Studies

Journal of Islamic Studies Journal of Semitic Studies

Journal of the History of Collections Journal of Semitic Studies Journal of Design History

Journal of the History of Collections Journal of Islamic Studies

Journal of the History of Collections

Journal of Islamic Studies

Journal of Design History

Journal of Information Science, 35 (6) 2009, pp. 660–676 © CILIP, DOI: 10.1177/0165551509338341 Downloaded from jis.sagepub.com at Univ of Northumbria on January 4, 2015

674

D. Nicholas et al.

Table 16 Journals featuring in the top five lists of two or more institutions History titles

Number of universities

English Historical Review

10

History Workshop Journal

9

Social History of Medicine

8

Twentieth Century British History

8

Journal of the History of Medicine and Allied Sciences

6

French History

4

Holocaust and Genocide Studies

2

6.  Conclusions The study’s key finding in respect to subject and institutional differences was that they were sometimes considerable, a finding which points to the danger of generalizing about usage and information seeking at the broad subject level. Thus, in terms of usage, the sheer scale of life science journal usage stood out, and of course, log analysis provides an exact measure of the scale so this is much easier to see. Life Science journals accounted for around 80% of OJ usage in the three case study subjects and this was not simply a reflection of the size of the available literature or size of the population of researchers. The parallel ScienceDirect study confirmed the scale of life science journal use. Other studies such as Smith [6] also found that science faculty members generally made more use of e-journals than those from the social science faculty. Few researchers have analysed patterns of use over time in this amount of detail and the results provide insightful reading, demonstrating, for instance, that: (a) life science use was the most even throughout the academic year; (b) substantial use goes on outside conventional office hours and over weekends, although not so much in the case of life scientists, perhaps because they are more tied to the work bench. In terms of navigating and finding content there were surprises as well as big differences. Google proved to be an extremely popular means of accessing OJ content, being responsible for 40% of sessions, more in the case of historians (45%), with whom it was especially popular. Historians were perhaps surprising in other ways as they used advanced searching, menus and basic searching most. Indeed, historians seemed to be very active searchers. The life scientists’ industry is probably played down in the logs because much of their searching is likely to have gone on in third-party, gateway sites, like PubMed. Interestingly, the super users, Cambridge and Manchester, tended to use advanced searching the least. When it came to the volume, type of content and journals viewed there was also a good deal of diversity with: (a) historians clearly not enamoured with abstracts, which seems to have been true for quite sometime [5]; (b) OA journals featuring strongly in the ranked lists for life sciences and history but not for economics. However, there were strong similarities and that was most true regarding the extremely focused and fast nature of use. Very few pages were viewed, typically three, most searches only saw one to two journals being viewed and session times averaging four to five minutes. This study provides support for previous studies in the following respects: • research intensive universities are characterized by high volume use and short session times, light sessions (those viewing few pages and articles), and sessions utilizing few of the search functions available; • life science journals attract the highest volume of use; • readers of economics journals are among the biggest users of abstracts; • life science users make relatively low use of abstracts on publisher platforms.

Journal of Information Science, 35 (6) 2009, pp. 660–676 © CILIP, DOI: 10.1177/0165551509338341 Downloaded from jis.sagepub.com at Univ of Northumbria on January 4, 2015

675

D. Nicholas et al.

Logs by providing evidence raise the very important questions that need to be asked, and to discover what all the very interesting data actually means it is necessary to conduct a survey of the individual researchers involved, which is what will happen during the second phase of the project running from April 2009 to January 2010. Clearly our conclusions are based on the study of one publisher’s journals at 10 research institutions. However, this is mitigated by the facts that: (a) no British study has studied logs on a 24/7 basis for a full 12 months of the year and in this regard provides an unparalleled wide-angle view of information usage and seeking; (b) the findings of this study are almost identical to those found for ScienceDirect (www.rin.ac.uk/use-ejournals), another publisher’s platform studied as part of the same RIN project, which suggests that these finding can be generalized across the piece; and (c) it addresses the information seeking behaviour of successful researchers, about which we know little and yet need to know a lot if we are to begin to address what is information seeking best practice.

Acknowledgement The authors would like to thank the Research Information Network for funding the project.

References   [1] CIBER, Evaluating the usage and impact of e-journals in the UK. Working paper 5. Information usage and seeking behaviour: subject and institutional profiles (UCL, London, 2009). Available at: www.ucl.ac.uk/ infostudies/research/ciber/ (accessed 1 March 2009).   [2] C. Tenopir, P. Wang, R. Pollard, Y. Zhang and P. Simmons, Use of electronic journals in the undergraduate curriculum: an observational study, Proceedings of the American Society for Information Science and Technology, 41 (2004) 64–71.   [3] D. Nicholas, P. Huntington and H.R. Jamali, Diversity in the information seeking behaviour of the virtual scholar: institutional comparisons, Journal of Academic Librarianship 33(6) (2008) 629–38.   [4] D. Nicholas, P. Huntington and H. R. Jamali, User diversity: as demonstrated by deep log analysis, Electronic Library 26(1) (2008) 21–38.   [5] T.A. Finholt and J. Brooks, Analysis of JSTOR: the impact on scholarly practice of access to on-line journal archives. In: R. Ekman and R.E. Quandt (eds), Technology and Scholarly Communication (University of California Press, Berkely, 1999) 177–94.   [6] E.T. Smith, Changes in faculty reading behaviours: the impact of electronic journals on the University of Georgia, Journal of Academic Librarianship 29(3) (2003) 162–68.   [7] N.A. Sathe, J.L. Grady and N.B. Giuse, Print versus electronic journals: a preliminary investigation into the effect of journal format on research processes, Journal of the Medical Library Association 90(2) (2002) 235–43.   [8] P. Boyce, D.W. King, C. Montgomery and C. Tenopir, How electronic journals are changing patterns of use, Serials Librarian 46(1–2) (2004) 121–41.   [9] P. Vakkari and S. Talja. Searching for electronic journal articles to support academic tasks. A case study of the use of the Finnish National Electronic Library (FinELib), Information Research 12(1) (2006). Available at: http://InformationR.net/ir/12–1/paper285.html (accessed 21 April 2009). [10] P. Vakkari. Trends in the use of digital libraries by scientists in 2000–2005: a case study of FinELib, Proceedings of the ASIS&T Annual Meeting (Austin, Texas, November 3–9, 2006).

Note 1 For the documentation and reports of the project see www.rin.ac.uk/use-ejournals

Journal of Information Science, 35 (6) 2009, pp. 660–676 © CILIP, DOI: 10.1177/0165551509338341 Downloaded from jis.sagepub.com at Univ of Northumbria on January 4, 2015

676